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Preface 


P ython is an object-oriented, interpreted programming language useful for a 
wide range of tasks, from small Scripts to entire applications. It is freely avail- 
able in binary or source code form and can be used royalty-free on all major plat- 
forms including Windows, Macintosh, Linux, FreeBSD, and Solaris. 

Compared with most programming languages, Python is very easy to learn and is 
considered by many to be the language of choice for beginning programmers. 
Instead of outgrowing the language, however, experienced developers enjoy lower 
maintenance costs without missing out on any features found in other major lan¬ 
guages such as C++, Java, or Perl. 

Python is well known for its usefulness as a rapid application development tool, 
and we often hear of Python projects that finish in hours or days instead of the 
weeks or months that would have been required with traditional programming lan¬ 
guages. It boasts a rlch, full-featured set of Standard libraries as well as the ability 
to interface with libraries in other languages like C++. 

Despite being incredibly powerful and enabling very rapid application develop¬ 
ment, the real reason we love to use Python is that it’s just plain fun. Python is like a 
lever — with it, you can do some pretty heavy lifting with very little effort. It frees 
you from lots of annoying, mundane work, and before long you begin to wonder 
how you endured your pre-Python days. 


About This Book 

Although Python is a great first programming language, in this book we do assume 
that you already have some programming experience. 

The first section of the book introduces you to Python and telis you everything you 
need to know to get started. If you’re new to Python, then that section is definitely 
the place to start; otherwise, it serves as a useful language reference with many 
examples. 

WeVe worked hard to ensure that the book works well as a quick reference. Often 
the qulckest way to understand a feature is to see it in use: Flip through the book’s 
pages and you’ll see that they are dripping with code examples. 
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All the examples in the book work and are things you can try on your own. Where 
possible, the chapters also build complete applications that have useful and inter- 
esting purposes. WeVe gone to great lengths to explain not only how to use each 
module or feature but also why such a feature is useful. 


What You Need 

Besides the book, all you need is a properly installed copy of Python. Appendix A 
lists some Python resources available online, but a good place to start is 
WWW. python . org; it has prebuilt versions of Python for all major platforms as well 
as the Python source code itself. Once youVe downloaded Python you’ll be under- 
way in a matter of minutes. 

If you’re a user of Microsoft Windows, you can download an excellent dlstribution 
of Python from www .acti vestate. com. ActiveState provides a single download 
that includes Python, a free development envlronment and debugger, and Win32 
extensions. 

PythonWare (www. pythonware. com) also offers a dlstribution of Python that 
comes bundled with popular third-party Python modules. PythonWare’s version 
peacefully coexists with older versions of Python, and the small dlstribution size 
makes for a quick download. 

No matter which site you choose, Python is free, so go download it and get started. 


How the Book Is Organized 

WeVe tried to organize the book so that related topics are close together. If you find 
the topic of one chapter particularly interesting, chances are that the chapters 
before and after it will pique your interest too. 

Part I: The Python Language 

The first chapter in this section is a crash course in Python programming. If you 
have many programming languages under your belt or just want to whet your 
appetite, try out the examples in that chapter to get a feel for Python’s syntax and 
powerful features. 

The remaining chapters in this first section cover the same material as Chapter 1 
but in much greater detail. They work equally well as an initlal tutorlal of the 
Python language and as a language reference for seasoned Pythonlstas. 
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XI 


Part II: Files, Data Storage, and 
Operating System Services 

This part covers Python’s powerful string and regular expression handling features 
and shows you how to access files and directories. In this section we also cover 
how Python enables you to easily write objects to disk or send them across net- 
work connections, and how to access relational databases from your programs. 

Part III: Networking and the Internet 

Python is an ideal tool for XML processing, CGI scripting, and many other network¬ 
ing tasks. This part guides you through Internet programming with Python, whether 
you need to send e-mail, run a Web site, or just amass the world’s largest .mp3 
collection. 

Part IV: User Interfaces and Multimedia 

This part covers Tkinter and wxPython, two excellent tools for building a GUI in 
Python. In this part, we also cover Python’s text interface tools, including support 
for Curses. This section also delves into Python’s support for graphics and sound. 

Part V: Advanced Python Programming 

This part answers the questions that come up in larger projects: How do I create 
multithreaded Python applications? How can I optimize my code, or glue it to C 
libraries? How can I make my program behave correctly in other countries? We also 
cover Python’s support for number crunching and security. 

Part VI: Deploying Python Applications 

This part covers what you need to know to deploy your Python programs quickly 
and painlessly. Python’s distribution Utilities are great for bundling and distributing 
applications on many platforms. 

Part VII: Platform-Specific Support 

Sometimes it’s nice to take advantage of an operating system’s strengths. This part 
addresses some Windows-specific topics (like accessing the registry), and some 
UNIX-specific topics (like file descriptors). 
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Appendixes 

Appendix A is a guide to online Python resources. Appendix B introduces you to 
IDLE and PythonWin — two great IDEs for developing Python programs. It also 
explains how to make Emacs handle Python code. 


Conventions Used in This Book 

Source code, function definitions, and Interactive sessions appear in tnonospaced 
font. Comments appear in bol d monospaced font preceded by a hash mark for 
easy reading. For example, this quick interpreter session checks the version of the 
Python interpreter. The >>> at the start of a line is the Python interpreter prompt 
and the text after the prompt is what you would type: 

>>> import sys # This is a comment. 

>>> print sys.version 

2.0 (#8, Oct 16 2000, 17:27:58) [MSC 32 bit (Intel)] 

References to variables in function definitions appear in italics. For example, the 
function randotn.choicefseq) chooses a random element from the sequence seq 
and returns it. 

We divided up the writlng of this book’s chapters between ourselves. So, through- 
out the book’s body, we use “I” (not “we”) to relate our individual opinions and 
experiences. 


What the Icons Mean 

Throughout the book, weVe used icons in the left margin to call your attention to 
points that are particularly important. 

New 4 This icon indicates that the materiai discussed is new to Python 2.0 or Python 2.1. 
Feature 



The Note icons teli you that something is important — perhaps a concept that may 
help you master the task at hand or something fundamental for understanding 
subsequent materiai. 



Tip 


Tip icons indicate a more efficient way of doing something or a technique that 
may not be obvious. 
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Caution Caution icons mean that the operatiori we're describing can cause problems if 
you're not careful. 

Cross- ^ We use the Cross-Reference icon to refer you to other sections or chapters that 
Referen^ have more to say on a subject. 


Visit Us! 

WeVe set up a Web site for the book at www. pythonapocrypha . com. On the site 
you’ll find additional information, links to Python Web sites, and all tbe code sam- 
ples from tbe book (so you can be lazy and not type them in). The Web site also has 
a section where you can give feedback on the book, and we post answers to com¬ 
mori questions. 

Have fun and enjoy tbe book! 
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Python in 
an Hour 


C H A Pir E R 


P ython is a rich and powerful language, but also one that 
is easy to learn. This chapter gives an overview of 
Python’s syntax, its useful data-types, and its unique features. 

As you read, please fire up the Python interpreter, and try out 
some of the examples. Feel free to experiment, tinker, and 
wander away from the rest of the tour group. Everything in 
this chapter is repeated, in greater detail, in later chapters, so 
don’t worry too much about absorbing everything at once. 
Try some things out, get your feet wet, and have fun! 

Jumping In: Starting the 
Python Interpreter 

The first thing to do, if you havenT already, is to install 
Python. You can download Python from www. python . org. As 
of this writing, the latest versions of Python are 2.0 (stable) 
and 2.1 (stili in beta). 

You can start the Python interpreter from the command line. 
Change to the directory where the interpreter lives, or add 
the directory to your path. Then type: 

python 

On UNIX, Python typically lives in/usr/local/bin;on 
Windows, Python probably lives in c:\python20. 

On Windows, you can also bring the interpreter up from 
Start O Programs O Python 2.0 O Python (command line). 
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Once you start the interpreter, Python displays something like this: 

Python 2.0 (#8, Oct 16 2000. 17:27:58) [MSC 32 bit (Intel)] on win32 
Type "Copyright", "credits" or "license" for more information. 

>>> 

The interpreter displays the >>> prompt to show that it’s ready for you to type in 
some Python. And so, in the grand tradition of programming books everywhere, we 
proceed to the “Helio world” example: 

>>> print "Hei 1o world!" 

Hei 1 0 world! 

To exit the interpreter, type the end-of-file character (Ctrl-Z on Windows, or Ctrl-D 
on Linux) and press Enter. 

Note You may prefer to interact with the interpreter in IDLE, the Standard Python IDE. 

IDLE features syntax coloring, a class browser, and other handy features. See 
Appendix B for tips on starting and using IDLE. 


Experimenting with Variables 
and Expressions 

Python’s syntax for variables and expressions is close to what you would see in C 
or Java, so you can skim this section if it starts looking familiar. However, you 
should take note of Python’s loose typing (see below). 


Pocket calculator 

Python understands the Standard arithmetic operators, including +, -, / (division), 
and * (multiplication). The Python interpreter makes a handy calculator: 

»> 8/2 
4 

>>> 5+4*6 
29 

Note that the second example evaluates 29 (and not 54); the interpreter multiplies 4 
by 6 before adding 5. Python uses operator precedence rules to decide what to do 
first. You can control order explicitly by using parentheses: 

>>> (5+4)*6 
54 

In practice, it’s often easiest to use parentheses (even when they aren’t required) to 
make code more readable. 
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Variables 

You can use variables to hold values over time. For example, thls code computes 
how long it takes to watch every episode of Monty Python ’s Flying Circus (including 
the two German episodes of Monty Python’s Fliegende Zirkusy. 

>>> NumberOfEpisodes=47 
>>> EpisodeLength=0.5 

>>> PythonMarathonLength=(NumberOfEpisodes*EpisodeLength) 

>>> PythonMarathonLength 
23.5 

A variable is always a reference to a value. Variables do not have types, but objects 
do. (Python is loosely typed; the same variable may refer to an integer value in the 
morning and a strlng value in the afternoon.) 

Python does not require variable declarations. However, you cannot access a 
variable until you have assigned it a value. If you try to access an undefined vari¬ 
able, the interpreter will complain (the wording of the error may be different in 
your version of Python): 

>>> print Scrumptious 

Traceback (most recent call last): 

File "<stdin>", line 1, in ? 

NameError: There is no variable named 'Scrumptious' 

This example raised an exception. In Python, most errors are represented by excep- 
tion objects that the surrounding code can handle. Chapter 5 describes Python’s 
exception-handling abilities. 

/Note Python is case-sensitive. This means that names that are capitalized differently 

' refer to different variables: 

>>> FavoriteColor="blue" 

>>> favoritecolor="yel1ow" 

>>> print FavoriteColor,favoritecolor 
bl ue yel1ow 


Defining a Function 

Assume you and some friends go out to dinner and decide to split the bili evenly. 
How much should each person pay? Here is a function that calculates each 
person’s share: 

>>> def SplitBi11(Bi11,NumberOfPeople): 

# The hash character (#) starts a comment. Python 

# ignores everything from # to the end of the line. 

TotalWithTip = Bili * (1.15) # Add a 15% tip. 
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return (Total Wi thTi p / NutnberOfPeopl e) 

>>> SplitBill(23.35,3) 

8.9508333333333336 

The statement def Functi onNatne (parameter, . . .): starts afunction definition. I 
indented the following four lines to indicate that they are a control block — a 
sequence of statements grouped by a common level of indentation. Together, they 
make up the body of the function definition. 

Python statements with the same level of indentation are grouped together. in this 
example, Python knows the function definition ends when it sees a non-indented 
line. Grouping statements by indentation-level is common practice in most pro- 
gramming languages; in Python it is actually part of the syntax. Normally, one 
indentation level equals four spaces, and elght spaces equals one tab. 


Running a Python Program 

A text file consisting of Python code is called a program, or a script, or a module. 
There is little distinction between the three terms — generally a script is smaller 
than a program, and a file designed to be imported (rather than executed directly) 
is called a module. Normally, you name Python code files with a . py extension. 

To run a program named spam. py, type the following at a command prompt: 

python spam.py 

In Windows, you can run a program by double-clicking it. (If tbe file association for 
the . py extension is not set up at installation time, you can configure it by right- 
clicking the script, choosing “Open With...” and then choosing python . exe.) 

In UNIX, you can run a script directly by using the “pound-bang hack.” Add this line 
at the top of the Python script (replacing the path with the path to env if it’s differ¬ 
ent on your system): 

#!/usr/bin/python 

Then make the file executable (by running chtnod +x <f i 1 enatne>), and you can run 
it directly. 


Looping and Control 

Listing 1-1 illustrates Python’s looping and conditional statements. It prints out all 
the prime numbers less than 500. 
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Listing 1-1: PrimeFinder.py 


print 1 

# Loop over the numbers from 2 to 499: 
for PrimeTest in range(2,500): 

# Assume PrimeTest prime until proven otherwise: 

IsPrime = 1 # 0 is false, nonzero is true 

# Loop over the numbers from 2 to (PrimeTest!): 
for TestFactor in range(2,PrimeTest): 

# a % b equals the remainder of a/b: 
if (PrimeTest % TestFactor == 0): 

# PrimeTest divides TestFactor (remainder is 0). 
IsPrime=0 

break # Jump out of the innermost forloop. 

if (IsPrime): 

print PrimeTest 


Integer division 

The modulo operator, %, returns the remainder when the first number is divided by 
the second. (For instance, 8 % 5 is equal to 3.) If Pr i meTes t is zero modulo 
TestFactor, then this remainder is zero, so TestFactor is one of PrimeTesfs 
divisors. 

In Python, dividing one integer by another returns another integer — the quotient, 
rounded down: 

>>> 8/3 # I want an integer, not the "right answer." 

2 

So, here is a sneaky replacement to line 7 of Pri me F i nder.py.If TestFactor does 
not divide Pri meTest evenly, then the quotient is rounded off, and so the compari- 
son will fail: 

if ((PrimeTest/TestFactor)*TestFactor == PrimeTest) 

Python uses the f 1 oat class for floating-point (decimal) numbers. The f 1 oat func- 
tion transforms a value into a float: 

»> 8.0/3.0 
2.6666666666666665 

>>> float(8)/float(3) # Give me the "real" quotient. 

2.6666666666666665 
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Looping 

The for statement sets up a loop — a block of code that is executed many times. 
The function range(startnurri,endnum) provides a list of integers starting with 
startnum and ending just before endnum. 

In the example, Pri meTest takes on each value in the range in order, and the outer 
loop executes once for each value of Pri meTest. The inner loop iterates over the 
“possible factors” of PrimeTest, starting at 2 and continuing until (PrimeTest-1). 

Branching with if-statements 

The statement ifexpression: begins a control block that executes only if 
expressi on is true. You can enclose the expression in parentheses. As far as 
Python is concerned, the number 0 is false, and any other number is true. 

Note that in a condition, we use the == operator to test for equality. The = operator 
is used only for assignments, and assignments are forbidden within a condition. 
(Here Python differs from C/C++, which allows assignments inside an if-condition, 
even though they are usually a horrible mistake.) 

In an i f statement, an else-clause executes when the condition is not true. For 
example: 

if (MyNumber % 1 == 0): 

print "MyNumber is even!" 
el se: 

print "MyNumber is odd!" 

Breaking and continuing 

The break statement jumps out of a loop. It exits the innermost loop in the current 
context. In Listing 1-1, the break statement exits the inner TestFactor loop, and 
continues on line 11. The conti nue statement jumps to the next iteration of a loop. 

Loops can also be set up using the whi 1 e statement. The syntax while (expres¬ 
sion) sets up a control block that executes as long as e x p r e s s i o n is true. For 
example: 

# print out powers of 2 less than 2000 
X=2 

while (X<2000): 
print X 
X=X*2 
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Lists and Tuples 

A list is an ordered collection of zero or more elements. An element of a list can be 
any sort of object. You can write lists as a comma-separated collection of values 
enclosed in square brackets. For example: 

FibonacciList=[l,l,2,3,5,8] 

FishList=[l,2,"Fish"] # Lists can contain various types. 

AnotherLi st=[1,2,Fi shLi st] # Lists can include other lists. 
YetAnotherList=[l,2,3,] Trailing commas are ok. 

RevengeOfTheList=[] # The empty list 

Tuples 

A tuple is similar to a list. The difference is that a tuple is immutable — it cannot be 
modified. You enclose tuples in parentheses instead of brackets. For example: 

Fi rstTupl e=( "spam", "spatn", "bacon", "spam") 

SecondTuple=() # The empty tuple 

LonelyTuple=(5,) # Trailing comma is *required*, since (5) is 
# just a numberin-parens, not a tuple. 

Slicing and dicing 

Lists are ordered, so each list element has an index. You can access an element with 
the syntax listname[index]. Note that index numbering begins with zero: 

>>> FoodList=["Spam","Egg","Sausage"] 

>>> FoodListCO] 

'Spam' 

>>> FoodList[2] 

'Sausage' 

>>> FoodList[2]="Spam" # Modifying list elements in place 

>>> FoodList 
['Spam' , ' Egg ' , 'Spam'] 

Sometimes it’s easier to count from the end of the list backwards. You can 
access the last item of a list with listname[ -l], the second-to-last item with 
listname[-2], and so on. 

You can access a sublist of a list via the syntax listname[start:end].The sublist 
contains the original list elements, starting with index start, up to (but not includ- 
ing) index end. Both start and end are optional; omitting them makes Python go all 
the way to the beginning (or end) of the list. For example: 

>>>WordList=["And","now","for","something","compl etely", 

"different"] 

>>> WordList[0:2] # From index 0 to 2 (not including 2) 

['And ' , 'now'] 
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>>> WordList[2:5] 

['for', ' sotnethi ng' , ' cotnpl etel y' ] 

>>> WordList[1] # AI 1 except the last 
['And', 'now', 'for', ' somethi ng' , ' completely' ] 

Substrings 

Lists, tuples, and strings are all sequence types. Sequence types ali support indexed 
access. So, taking a substring in Python is easy: 

>>> Word="pig" 

>>> PigLatinWord=Word[l:]+Word[0]+"ay" 

>>> PigLatinWord 
'igpay' 


Immutable types 

Tuples and strings are immutable types. Modifying them in place is not allowed: 

FirstTupl e[0] = "Egg" # Object does not support item assignment. 

You can switch between tuples and lists using the tupl e and 1 i st functions. So, 
although you cannot edit a tuple directly, you can create a new-and-improved tuple: 

>>> FoodTuple=("Spam","Egg","Sausage") 

>>> FoodList=li st(FoodTuple) 

>>> FoodList 

['Spam', 'Egg', 'Sausage'] 

>>> FoodList[2]="Spam" 

>>> NewFoodTuple=tuple(FoodList) 

>>> NewFoodTuple 
('Spam' , ' Egg ' , 'Spam' ) 


Dictionaries 

A dictionary is a Python object that cross-references keys to values. A key is an 
immutable object, such as a string. A value can be any object. A dictionary has a 
canonical string representation: a comma-separated list of key-value pairs, enclosed 
in curly braces: {key.value, key.value}. For example: 

>>> PhoneDict = {"bob":"555-1212","fred":"555-3345") 

>>> EmptyDict={) # Initialize a new dictionary. 

>>> PhoneDict["bob" ] # Find bob's phone number. 

'555-1212' 

>>> PhoneDict["cindy"]="867-5309" # Add an entry. 

>>> print "Phone 1 i stPhoneDict 

Phone list: {'fred': '555-3345', 'bob': '555-1212', 'cindy': 
'867-5309'} 
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Looking up a value raises an exception if the dictionary holds no value for the key. 
The function di cti onary. get( key, defaul tVal ue) performs a “safe get”; it looks 
up the value corresponding to key, but if there is no such entry, returns 

defaultVa1ue. 

»> PhoneDi ct [" 1 uke"] # May raise an exception. 

Traceback (most recent call last); 

File "<stdin>", line 1, in ? 

KeyError: luke 

>>> PhoneDict.get("joe"unknown") 

'unknown' 

Often a good default value is the built-in value None. The value None represents 
nothing (it is a little Zen-like). The value None is similar to N U L L in C (or n u 11 in 
Java). It evaluates to false. 

>>> DialAJoe=PhoneDict.get("joe",None) 

>>> print DialAJoe 
None 


Reading and Writing Files 

To create a file object, use the function open (fi 1 ename ,mode) . The mode 
argument is a string explaining what you intend to do with the file — typical values 
are “w” to write and “r” to read. Once you have a file object, you can read() from it 
or write() to it, then close() it. This example creates a simple file on disk: 

>>> fred = open("hei 1 0 ","w") 

>>> fred.write("Hei 1 0 world!") 

>>> fred.close() 

>>> barney = open("hei 1o"r") 

>>> FileText = barney.read() 

>>> barney.close() 

>>> print FileText 
Hei 1 0 world! 


Sample Program: Word Frequencies 

Different authors use different words. Patterns of word use form a kind of “author 
fingerprint” that is sometimes used as a test of a document’s authenticity. 

Listing 1-2 counts occurrences of a word in a body of text, and illustratos some 
more Python power in the process. (Don’t be intimldated by all the comments — it’s 
actually only 26 lines of code.) 
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Listing 1-2: WordCount.py 


# Import the string module, so we can call Python's Standard 

# string-related functions. 

import string 

def CountWords(Text): 

"Count how many times each word occurs in Text." 

# A string immediately after a def statement is a 

# "docstring" - a comment intended for documentation. 

WordCount={) 

# We wi11 build up (and return) a dictionary whose keys 

# are the words, and whose values are the corresponding 

# number of occurrences. 

CurrentWord="" 

# To make the job cleaner, add a period at the end of the 

# text; that way. we are guaranteed to be finished with 

# the current word when we run out of letters: 

Text=Text+"." 

# We assume that ' and - don't break words, but any other 

# nonalphabetic character does. This assumption isn't 

# entirely accurate, but it's close enough for us. 

# string. 1 etters is a string of all alphabetic characters. 

PiecesOfWords = string.1etters + 

# Iterate over each character in the text. The 

# function 1 en () returns the 1ength of a sequence, 

# such as a string: 

for Characterindex in range(0,1 en(Text)): 
CurrentCharacter=Text[Character Index] 

# The findO method of a string finds 

the starting index of the fi rst occurrence of a 

# substring within a string, or returns -1 

# if it doesn't find the substring. The next 

# line of code tests to see whether CurrentCharacter 

# is part of a word: 

if (PiecesOfWords.find(CurrentCharacter)!=-l): 

# Append this letter to the current word. 

CurrentWord=CurrentWord+CurrentCharacter 
el se: 

# This character is not a letter. 
if (CurrentWord!=""): 

# We just finished off a word. 

# Convert to lowercase, so "The" and "the" 

# fall in the same bucket. 

CurrentWord = string.1ower(CurrentWord) 

# Now increment this word's count. 

CurrentCount=WordCount.get(CurrentWord,0) 
WordCount[CurrentWord]=CurrentCount+l 
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# Start a new word. 

CurrentWord="" 
return (WordCount) 

if (_natne_=="_main_"): 

Read the text from the file song.txt. 

TextFi 1 e=open ("poetn. txt","r") 

Text=TextFi1 e.read() 

TextFi 1 e. cl ose( ) 

# Count the words in the text. 

WordCount=CountWords(Text) 

# Alphabetize the word list, and print them all out. 

SortedWords=WordCount.keys() 

SortedWords. sort() 
for Word in SortedWords: 

print Word,WordCount[Word] 


Listing 1-3: poem.txt 


Shall I compare thee to a summer's day? 

Thou art more lovely and more temperate: 

Rough winds do shake the darling buds of May, 

And summer's lease hath all too short a date: 
Sometime too hot the eye of heaven shines 
And often is his gold complexion dimmed; 

And every fair from fair sometimes declines, 

By chance or nature's changing course untrimmed; 
But thy eternal summer shall not fade. 

Nor lose possession of that fair thou ow'st: 

Nor shall Death brag thou wander'st in his shade, 
When in eternal lines to time thou grow'st: 

So long as men can breathe, or eyes can see, 

So long lives this, and this gives 1 ife to thee. 


Listing 1-4: WordCount output 


all 1 
and 5 
art 1 
as 1 
brag 1 

[. . .omitted for brevity. . .] 
too 2 

untrimmed 1 
wander'st 1 
when 1 
winds 1 







14 Part I -f The Python Language 


Loading and Using Modules 

Python comes with a collection of libraries to do all manner of useful things. To use 
the functions, classes, and variables in another Python module, you must first 
import that module with the statement import modul ename. (Note: No parenthe¬ 
ses.) After importing a module, you can access any of its members using the syntax 
modul eName. i temName. For instance, this line (from the preceding example) calls 
the function 1 ower in the module stri ng to convert a string to lowercase. 

CurrentWord = string.1ower(CurrentWord) 

When you Import a module, any code at module level (that is, code that isn’t part of 
a function or class definition) executes. To set aside code to execute only when 
someone runs your script from the command line, you can enclose it in an i f 
(_name_=="_mai n _") block, as in Listing 1-2 above. 

As an alternative to “import foo,” you can use the syntax from f oo i mport 
i temName to import a function or variable all the way into the current namespace. 
For example, after you include the line frommath import sqrtina Python script, 
you can call the square-root function sqrt directly, instead of calling math.sqrt. 
You can even bring in everything from a module with from foo i mport *. However, 
although this technique does save typing, it can become confusing — especially if 
you import functions with the same name from several different modules! 

'Note Python does not enforce "privacy" in modules; you can call any of a module's 

functions. It is generally a good idea to be polite and only call those you are sup- 
posed to. 


Creating a Class 

Python is an object-oriented language. In fact, every piece of Python data is an 
object. Working with objects in Python is easy, as you will soon see. 

Some quick object jargon 

A class is a mechanism for tying together data and behavior. An instance of a partic- 
ular class is called an object. Class instances have certain methods (functions) and 
attributes (data values). In Python, all data items behave like objects, even though a 
few base types (like integers) are not actual instances of a class. 

You can derive a class from a parent class; this relationship is called inheritance. 
Instances of the child (derived) class have the same attributes and methods of the 
parent class. The child class may add new methods and attributes, and override 
methods of the parent. A class may be derived from more than one parent class; 
this relationship is called multiple inheritance. 
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Object-Oriented programming (OOP) is a mindset that may take some getting used 
to. When inheritance becomes natural, and you start talking about your data in 
anthropomorphic terms, you will know that your journey to the 00 side is com¬ 
plete. See the References section for some resources that explain object-oriented 
programming in detail. 

Object orientation, Python style 

You define a new class with the syntax cl ass C1 assNatne. The control block 
following the class statement is the class declaration; it generally consists of sev- 
eral method definitions. You define a child class (using inheritance) via the syntax 

class ClassName(ParentCl ass). 

You create an object via the syntax NewObject = Cl assNatne (). When you create 
an object, Python calls its constructor, if any. In Python, a constructor is a member 

function with the name_ i ni t_. A constructor may require extra parameters 

to create an object. If so, you provide them when creating the object: NewObject = 
ClassNatne(paratnl,paratn2, . . . ). 

Every object method takes, as its first parameter, the argument sel f, which is a 
reference to the object. (Python sel f is similar to thi s in C++/Java, but sel f is 
always explicit.) 

You do not explicitly declare attributes in Python. An objecfs attributes are not 
part of the local namespace — in other words, to access an objecfs attribute foo in 
one of its methods, you must type sel f. foo. 

Keep off the grass-Accessing class members 

Attributes and methods are all “public” — they are visible and available outside the 
object. However, to preserve encapsulation, many classes have some attributes or 
methods you should not access directly. The motlvation for this is that an object 
should be something of a “black box” — code outside the object should only care 
what it does, not how it does it. This helps keep code easy-to-maintain, especially in 
big programs. 

Example: the point class 

Listing 1-5 defines a class representing a point in the plane (or on a computer 
screen): 
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Listing 1-5: Point.py 


import math 

# The next statement starts our class declaration; the 

# function declarations inside the indented control block are 

# the class's methods. 

class Point: 

# The method _init_ is the class's constructor. It 

# executes when you create an instance of the class. 

# When _init_ takes extra parameters (as it does here), 

# you must supply parameter values in order to create an 

# instance of the class. Writing an _init_ method is 

# optional . 

def _init_(self,X,Y): 

# X and Y are the attributes of this class. You do not 

# have to declare attributes. I 1 ike to initialize 

# all my attributes in the constructor, to ensure that 

# the attributes wi11 be available when I need them. 

self.X=X 
self.Y=Y 

def DistanceToPoint(self, OtherPoint): 

"Returns the distance from this point to another" 
SumOfSquares = ((self.X-OtherPoint.X )**2) +\ 
((self.Y-OtherPoint.Y )** 2 ) 
return math.sqrt(SumOfSquares ) 

def IslnsideCircle(self, Center, Radius): 

.Return 1 if this point is inside the circle, 

0 otherwise. 

i f (sel f. Di stanceToPoi nt( Center XRadi us ): 

return 1 
el se: 

return 0 

# This code tests the point class. 

PointA=Point(3,5) # Create a point with coordinates (3,5) 

PointB=Point(-4,-4) 

# How far is it from point A to point B? 

print "A to BPointA.DistanceToPoint(PointB) 

# What if I go backwards? 

print "B to APointB.DistanceToPoint(PointA) 

# Who lives inside the circle of radius 5 centered at (3,3)? 

CircleCenter=Point(3,3) 

print "A in ci rcl e:",Poi ntA.Islnsi deCi rcl e(Ci rcl eCenter,5) 
print "B in circle:",PointB.IslnsideCircle(CircleCenter , 5) 
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Recommended Reading 

If you are new to computer programming, you may find this tutorial useful: 

http: //WWW. honors.montana.edu/~jjc/easytut/easytut/. 

To learn all about the language on one (large!) page, see the Python Quick 
Reference at http://starsh i p. python.net/qu i cl<-refl_5 2.html. 

If you like to learn by tinkering with finished programs, you can download a 
wide variety of source code at the Vaults of Parnassus: http : //www. vex.net/ 
parnassus/. 


Summary 

This wraps up our quick tour of Python. We hope you enjoyed the trip. You now 
know most of Python’s notable features. In this chapter, you: 

Ran the Python interpreter for easy interaction. 

-f Grouped statements by indentation level. 

Wrote functions to count words in a body of text. 

Created a handy Point class. 

The next chapter digs a little deeper and introduces all of Python’s Standard types 
and operators. 



C H A P If E R 


Identifiers, 
Variables, and 
Numeric Types 

O ne of the simplest forms of data on which your pro- 

grams operate is numbers. This chapter introduces the 
numeric data types in Python, such as integers and floating 
point numbers, and shows you how to use them together in 
simple operations like assignment to variables. 

As with Chapter 1, you’ll find it helpful to have a Python inter¬ 
preter up and running as you read this and the followlng chap- 
ters. Playing around with the examples in each section will 
pique your curiosity and help keep Python’s features firmly 
rooted in your brain. 



> ♦ ♦ ♦ 

In This Chapter 

Identifiers and 
operators 

Numeric types 

Assigning values to 
variables 

> > ♦ ♦ 


Identifiers and Operators 

Variable names and other identifiers in Python are similar to 
those in many other languages: they start with a letter (A-Z or 
a-z) or an underscore and are followed by any number 
of letters, numbers, and underscores. Their length is limited 
only by your eagerness to type, and they are case-sensitive 
(that is, spam and Spam are different identifiers). Regardless of 
length, choose identifiers that are meaningful. (Having said 
that, 1’11 break that rule for the sake of conciseness in many of 
the examples in this chapter.) 

The following are some examples of valid and invalid identifiers: 

wordCount 
y_axis 
errorField2 
_1 0 g F i 1 e 
_2 

good idea 


# Technically valid, but not a 
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/Index # Invalid, starts with a number 

won't_work # Invalid due to apostrophe character 

Python considers these forms to have speciai meaning: 


na me — Not imported by "from x import *" (see Chapter 6) 


name _— System name (see Chapter 6) 

n ame — Private class member (see Chapter 7) 


When you’re running the Python interpreter in interactive mode, a single underscore 
character (_) is a speciai identifier that holds the resuit of the last expression evalu- 
ated. This is especially handy when you’re using Python as a desktop calculator: 


>>> "Helio" 
’Hei 1 0 ' 

>>> _ 

'Hei 1 0 ' 

>>>5+2 

7 

>>> „ * 2 

14 

>>> _ + 5 
19 
>>> 


Reserved words 

Although it would make for some interesting source code, you can’t use the follow- 
ing words as identifiers because they are reserved words in the Python language: 


and 

dei 

for 

i s 

rai se 

assert 

el if 

from 

1ambda 

return 

break 

el se 

global 

not 

try 

cl ass 

except 

i f 

or 

whi 1 e 

conti nue 

exec 

import 

pass 


def 

final 1 y 

i n 

p r i n t 



Operators 

Python has the following operators, each of which we’ll discuss in context with the 
applicable data types they operate on: 

!=%&***/ ^ I ~ 

+ < << <= <> == > >= >> 
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Numeric Types 

Python has four built-in numeric data types: integers, long integers, floating point 
numbers, and imaginary numbers. 

Integers 

Integers are whole numbers in the range of -2147483648 to 2147483647 (that is, they 
are signed, 32-bit numbers). 

Tip For convenience, the sys module has a maxint member that holds the maxi- 

^ mum positive value of an integer variable: 

* >>> import sys 

>>> sys.maxint 
2147483647 

In addition to writing integers in the default decimal (base 10) notation, you can 
also write integer literals in hexadecimal (base 16) and octal (base 8) notation by 
preceding the number with a Ox or 0, respectively: 


>>> 

300 

300 

# 

300 

i n 

decimal 

>>> 

300 

0xl2c 

# 

300 

i n 

hex 

>>> 

300 

0454 

# 

300 

i n 

octal 


Keep in mind that for decimal numbers, valid digits are 0 through 9. For hexa¬ 
decimal, it’s 0 through 9 and A through F, and for octal it’s 0 through 7. If you’re not 
familiar with hexadecimal and octal numbering Systems, or if you are but they don’t 
thrill you, just nod your head and keep moving. 

Long integers 

Long integers are similar to integers, except that the maximum and minimum val- 
ues of long integers are restricted only by how much memory you have (yes, you 
really can have long integers with thousands of digits). To differentiate between the 
two types of integers, you append an “L” to the end of long integers: 

>>> 200L A long integer literal with a value of 200 

200L 

>>> 11223344 * 55667788 # Too big for normal integers... 

Traceback (innermost last): 

File "<interactive input>", line 1, in ? 

Overf1owError: integer multipli cation 

>>> 11223344L * 55667788L # ...but works with long integers 

624778734443072L 
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Tip The "L" on long integers can be uppercase or lowercase, but do yourself a favor 

and always use the uppercase version. The lowercase "L" and the one digit look 
too similar, especially if you are tired, behind schedule on a project, or both. 

Floating point numbers 

Floating point numbers let you express fractional numeric values such as 3.14159. 
You can also include an optional exponent. If you include neither an exponent nor a 
decimal point, Python interprets the number as an integer, so to express “the float¬ 
ing point number two hundred,” write it as 200.0 and not just 200. Here are a few 
examples of floating point numbers: 

200.05 

9.80665 

.1 

20005e-2 
6.0221367E23 

Occasionally you may notice what appear to be rounding errors in how Python 
displays floating point numbers: 

>» 0.3 

0.29999999999999999 

Don't worry; this display is not indicating a bug, but is just a friendiy reminder that 
your digital computer just approximates real world numbers. See "Formatting 
strings" in Chapter 3 to leam about printing numbers in a less ugly format. 

The valid values for floating point numbers and the accuracy with which Python 
uses them is implementation-dependent, although it is at least 64-bit, double- 
precision math and is often IEEE 754 compilant. 

Imaginaiy numbers 

Unlike many other languages, Python has language-level support for imaginary 
numbers, maklng it trivial to use them in your programs. You form an imaginary 
number by appending a “j” to a decimal number (integer or floating point): 

3j 

2.5e-3j 

When you add a real and an imaginary number together, Python recognizes the 
resuit as a complex number and handles it accordingly: 

»> 2 + 5j 
(2+5j) 

>>> 2 * (2 4 - 5j) 

(4+10j) 
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Manipulating numeric types 

You can use most of Python’s operators when working with numeric data types. 

Numeric operators 

Table 2-1 lists operators and how they behave with numeric types. 




Table 2-1 



Operations on Numeric Types 


Operator 

Description 

Example Input 

Example Output 

Unary Operations 

-1- 

Plus 

+2 

2 

- 

Minus 

-2 

2 



-(-2) 

2 

~ 

Inversion’ 

~5 

6 

Binary Operations 




-1- 

Addition 

5 + 7 

12 



5 + 7.0 

12.0 

- 

Subtraction 

5 - 2 

3 



5-2.0 

3.0 

* 

Multiplication 

2.5*2 

5.0 

/ 

Division 

5 / 2 

2 



5/2.0 

2.5 

% 

Modulo (remainder) 5 % 2 

1 



7.5 % 2.5 

0.0 


Power 

5 ** 2 

25 



1.2 ** 2.1 

1.466... 

Binary Bitwise Operations^ 



& 

AND 

5 & 2 

0 



11 & 3 

3 

1 

OR 

5 1 2 

7 



11 1 3 

11 

A 

XOR (exclusive-or) 

5*2 

7 



11 * 3 

8 


Continued 
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Table 2-1 (continued) 


Operator 

Description Example Input 

Example Output 


Shifting Operations^ 

« Left bit-shift 5 << 2 20 

» Right bit-shift 50 >> 3 6 


1 Unary bitwise inversion of a number x is defined as -(x+1). 

2 Numbers used in binary bitwise and shifting operations must be integers or long integers. 


It is important to notice what happens when you mix Standard numeric types 
(adding an integer and a floating point number, for example). If needed, Python first 
coerces (converts) either of the numbers according to these rules (stopping as 
soon as a rule is satisfied): 

1. If one of the numbers is a complex number, convert the other to a complex 
number too. 

2. If one of the numbers is a floating point number, convert the other to floating 
point. 

3. If one of the numbers is a long integer, convert the other to a long integer. 

4. No previous rule applies, so both are integers, and Python leaves them 
unchanged. 

Other functions 

Python has a few other built-in functions for working with numeric types, as 
described in the following sections. 

Absolute value - abs 

The a bs ( X ) function takes the absolute value of any integer, long integer, or floating 
point number: 

>>> abs(-5.0) 

5.0 

>>> abs(-20L) 

20L 

When applied to a complex number, this function returns the magnitude of the num¬ 
ber, which is the distance from that point to the origin in the complex plane. Python 
calculates the magnitude just like the length of a line in two dimensions: for a com¬ 
plex number ( a -i- bj ), the magnitude is the square root of a squared plus b 
squared: 


>>> abs(5 - 2j) 
5.3851648071345037 
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Convert two numbers to a common type-coerce(x, y) 

The coerce function applies the previously explained numeric conversion rules to 
two numbers and returns them to you as a tuple (we cover tuples in detail in the 
next chapter): 

>>> coerce(5,2L) 

(5L, 2L) 

>>> coerce(5.5,2L) 

(5.5, 2.0) 

>>> coercet5.5,5 + 2j) 

((5.5+Oj), (5+2j)) 

Quotient and remainder-divmod(a, b) 

This function performs long division on two numbers and returns the quotient and 
the remainder: 

>>> di vtnod( 5,2) 

( 2 , 1 ) 

>>> di vtnod( 5.5,2) 

(2.0, 1.5) 

Power - pow(x, y [, z]) 

The pow function is similar to the power (**) operator in Table 2-1: 

>>> pow(5,2) 

25 

>>> pow(1.2,2.1) 

1.4664951016517147 

As usual, Python coerces the two numbers to a common type if needed. If the 
resulting type can’t express the correct resuit, Python yells at you: 

>>> pow(2.0,-l) # The coerced type is a floating point. 

0.5 

>>> pow(2,-l) # The coerced type is an integer. 

Traceback (innermost last): 

File "<interactive input>", line 1, in ? 

ValueError: integer to the negative power 

An optional third argument to pow specifies the modulo operation to perform on 
the resuit: 

>>> pow(2,5) 

32 

>>> pow(2,5,10) 

2 

>>> (2 **5) % 10 
2 
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The resuit is the same as using the power and modulo operators, but Python 
arrives at the resuit more efficiently. (Speedy power-and-modulo is useful in some 
types of cryptography.) 


Round - round(x [, n]) 

This function rounds a floating point number x to the nearest whole number. 
Optionally, you can teli it to round to n digits after the decimal point: 


>>> round(5.567) 

6.0 

>>> round(5.567,2) 

5.57 

Cross- A Chapter 31, "Number Crunching," covers several Python modules that deal with 
Referen^ math and numerical data types. 


Assigning Values to Variables 

With basic numeric types out of the way, we can take a break before moving on to 
other data types, and talk about variables and assignment statements. Python cre- 
ates variables the first time you use them (you never need to explicitly declare 
them beforehand), and automatically cleans up the data they reference when they 
are no longer needed. 

Refer back to “Identifiers and Operators” at the beginning of this chapter for the 
rules regarding valid variable names. 


Simple assignment statements 

The simplest form of assignment statements in Python are of the form variable = value: 

>>> a = 5 
»> b = 10 
>>> a 
5 

>>> b 
10 

>>> a + b 
15 

>>> a > b 
0 

Cross- A "Understanding References" in Chapter 4 goes into more depth about how and 
Referen^ when Python destroys unneeded data, and "Taking Out the Trash" in Chapter 26 
covers the Python garbage collector. 
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A Python variable doesn’t actually contain a piece of data but merely references a 
piece of data. The details and importance of this are covered in Chapter 4, but for 
now it’s just important to note that the type of data that a variable refers to can 
change at any time: 

>>> a = 10 

>>> a # First it refers to an integer. 

10 

>>> a = 5.0 + 2j 

>>> a # Now it refers to a complex number. 

(5+2j) 


Multiple assignment 

Python provides a great shorthand method of assigning values to multiple variables 


at the same time: 

>>> 

a , b , c = 

>>> 

5.5 

a 

>>> 

2 

b 

>>> 

10 

c 


You can also use multiple assignment to swap any number of variables. Continuing 
the previous example: 

>>> a,b,c = c , a , b 
>>> a 
10 

>>> b 
5.5 
>>> c 
2 


Cross- A Multiple assignment is really tuple packing and unpacking, covered in Chapter 4. 
ReferenceA 


Augmented assignment 

Another shorthand feature is augmented assignment, which enables you to combine 
an assignment and a binary operation into a single statement: 


>>> a = 10 
>>> a += 5 
>>> a 
15 
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New Augmented assignment was introduced in Python 2.0. 

Fsatujji 


Python provides these augmented assignment operators: 

+= -= *= /= %= **= 

>>= <<= &= 1 = 

The statement a += 5 is nearly identical to the longer form of a = a + 5 with two 
exceptions (neither of which you need to worry about too often, but are worth 
knowing): 

1. In augmented assignment, Python evaluates a only once instead of the two 
times in the longhand version. 

2. When possible, augmented assignment modifies the original object instead of 
creating a new object. In the longhand example above, Python evaluates the 
expression a + 5, creat es a place in memory to hold the resuit, and then re- 
assigns a to reference the new data. With augmented assignment, however, 
Python places the resuit in the original object. 


Summary 

Python has several built-in data types and many features to help you work with 
them. In this chapter you: 

-f Learned the rules for valid Python variable names and other identifiers. 

Created variables using integer, floating point, and other numerical data. 

Used augmented assignment statements to combine basic operations such as 
addition with assignment. 

In the next chapter you discover how to use expressions to compare data and you 
learn how character strings work in Python. 

> > -f 


Expressions 
and Strings 


C haracter strings can hold messages for users to read 
(a la “Helio, world!”), but in Python they can also hold a 
sequence of binary data. This chapter covers how you use 
strings in your programs, and how you can convert between 
strings, numbers, and other Python data types. 

Before you leave this chapter, you’ll also have a solid grasp of 
expressions and how your programs can use them to make 
decisions and compare data. 



> ♦ ♦ ♦ 

In This Chapter 

Expressions 

Strings 

Converting between 
simple types 

♦ ♦ ♦ ♦ 


Expressions 

Expressions are the core building blocks of decision making in 
Python and other programming languages, and Python evalu- 
ates each expression to see if it is true or false. 


The most basic form of a Pytbon expression is any value: if 
the value is nonzero, it is considered to be “true,” and if it 
equals 0, it is considered to be “false.” 

Cross- ^ Chapter 4 goes on to explain that Python also considers 
Referenc^ any nonempty and non-None objects to be true. 

More common, however, is the comparison of two or more 
values with some sort of operator: 

>>> 12 > 5 # This expression is true. 

1 

>>> 2 < 1 # This expression is false. 

0 


Comparing numeric types 

Python supplies a Standard set of operators for comparing 
numerical data types. Table 3-1 lists these comparison opera¬ 
tors with examples. 
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Table 3-1 

Comparison Operators 

Operator 

Description 

Sample Input 

Sample Output 

< 

Less than 

10<5 

0 

> 

Greater than 

10>5 

1 

<= 

Less than or equal 

3 <= 5 

1 



3<=3 

1 

>= 

Greater than or equal 

3 >= 5 

0 

== 

Equality 

3 = 3 

1 



3 = 5 

0 

1 = 

Inequality* 

3 !=5 

1 


* Python also supports an outdated inequality operator: O. It may not be supported in the future. 


Before comparing two numbers, Python applies the usual coercion rules if 
necessary. 

A comparison between two complex numbers involves only the real part of each 
number if they are different. Only if the real parts of both are the same does the 
comparison depend on the imaginary part: 

»> 3 + lOj < 2 + lOOOj 
0 

»> 3 + lOj < 3 + lOOOj 
1 

Python doesn’t restrict you to just two operands in a comparison; for example, you 
can use the common a < b < c notation common in mathematics: 

>>> a,b,c = 10,20,30 
>>> a < b < c 

# True because 10 < 20 and 20 < 30 

Note that a < b < c is the same as comparing a < b and then comparing b < c, except 
that b is evaluated only once (besides being nifty, this could really make a differ- 
ence if evaluating b required a lot of Processing time). 

Expressions like a < b > c are legal but discouraged, because to the casual observer 
(for example, you, late at night, searching for a bug in your code) they appear to 
imply a comparison or relationship between a and c, which is not really the case. 

Python has three additional functions that you can use when comparing data: 
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min (x[, y,z,...]) 

The mi n function takes two or more arguments of any type and returns the smallest: 

>>> mindO,20.5,5,1001) 

5 

max (x[, y,z,...]) 

Similarly, max chooses the largest of the arguments passed in: 

>>> maxdO,20.5,5,1001) 
lOOL 

Both mi n and max can accept a sequence as an argument (See Chapter 4 for Infor¬ 
mation on lists and tuples.): 

>>> Ages=[42,37,26] 

>>> min(Ages) 

26 

cmp (x,y) 

The comparison function takes two arguments and returns a negative number, 0, or 
a positive number if the first argument is less than, equal to, or greater than the 
second: 

>>> cmp(2,5) 
d 

>>> cmp(5,5.0) 

0 

>>> cmp(5,2) 

1 

Do not rely on the values being strictly 1, -1, or 0, especially when calling cmp with 
other data types (for example, strings). 

Compound expressions 

A compound expression combines simple expressions using the Boolean operators 
and, or, and not. Python treats Boolean operators slightly differently than many 
other languages do. 

and 

When evaluating the expression a and b, Python evaluates a to see if it is false, and 
if so, the entire expression takes on the value of a. If a is true, Python evaluates b 
and the entire expression takes on the value of b. There are two important points 
here. First, the expression does not evaluate to just true or false (0 or 1): 
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>>> 

a, b = 

10,20 


>>> 

20 

a and 

b 

# a is true, so evaluate b 

>>> 

a, b = 

0,5 


>>> 

a and 

b 



0 

Second, if a (the first expression) evaluates to false, then Python never bothers to 
evaluate b (the second expression): 

>>> 0 and 2/0 # Doesn't cause di vi sion by zero error 

0 


or 

With the expression a or b, Python evaluates a to see if it is true, and if so, the 
entire expression takes on the value of a . When a is false, the expression takes on 
the value of b: 

>>> a,b = 10,20 
>>> a or b 
10 

>>> a,b = 0,5 
>>> a or b 
5 

Simllar to the and operator, the expression takes on the value of either a or b 
instead of just 0 or 1, and Python evaluates b only if a is false. 

not 

Finally, not inverts the “truthfulness” of an expression: if the expression evaluates 
to true, not returns false, and vice versa: 

>>> not 5 
0 

>>> not 0 
1 

>>> not (0 > 2) 

1 

Unlike the and and or operators, not always returns a value of 0 or 1. 

Complex expressions 

You can form arbitrarily complex expressions by grouping any number of expres¬ 
sions together using parentheses and Boolean operators. For example, if you just 
can’t seem to remember if a number is one of the first few prime numbers, this 
expression will bail you out: 
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>>> i = 5 

>>> (i == 2) or (i % Z != 0 and 0 < i < 9) 

1 

>>> i = 2 

>>> (i == 2) or (i % 2 != 0 and 0 < i < 9) 

1 

>>> i = 4 

>>> (i == 2) or (i % 2 != 0 and 0 < i < 9) 

0 

If the number is 2, the first sub-expression (i == 2) evaluates to true and Python 
stops Processing the expression and returns 1 for true. Otherwise, two remaining 
conditions must be met for the expression to evaluate to true. The number must 
not be evenly divisible by 2, and it must be between 0 and 9 (hey, I said the first few 
primes, remember?). 

Parentheses let you explicitly control the order of what gets evaluated first. Without 
parentheses, the order of evaluation may be unclear and different than what you 
expect (and a great source of bugs): 

>>> 4 or 1 * 2 
4 

A well-placed pair of parentheses clears up any ambiguity: 

>>> (4 or 1) * 2 
8 


Operator precedence 

Python uses the ordering in Table 3-2 to guide the evaluation of complex expres¬ 
sions. Expressions using operators higher up in the table get evaluated before 
those towards the bottom of the table. Operators on the same line of the table have 
equal priority or precedence. Python evaluates operators with the same prece¬ 
dence from left to right. 


Table 3-2 

Operator Precedence (from lowest to highest) 

Operators Description 

' X' String conversion 

Dictionary 
List 
Tuple 


{key:datum, . . .) 
[x,y,.. .] 
(x.y,...) 


Continued 
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Table 3-2 (continued) 

Operators 

Description 

f(x,y,...) 

Function call 

x[j:k] 

Slice 

x[j] 

Subscription 

X . attribute 

Attribute reference 


Bitwise negation (inversion) 

+x, -X 

Plus, minus 

-k-k 

Power 

*, /.% 

Multiply, divide, modulo 

+ , - 

Add, subtract 

<<, >> 

Shifting 

& 

Bitwise AND 

A 

Bitwise XOR 

1 Bitwise OR 

<, <=. ==, ! = , >=, > 

Comparisons 

is, is not 

Identity 

in, not in 

Membership 

not X 

Boolean NOT 

and 

Boolean AND 

or 

Boolean OR 

1ambda 

Lambda expression 



See Chapters 4 through 7 for more information on operators and data types such 
as lists and tuples that we have not yet covered. 


Strings 

A string is Python’s data type for Holding not only text but also “non-printable” or 
binary data. If youVe done much work with strings in languages like C or C++, pre- 
pare to be liberated from mundane memory management tasks as well as a plethora 
of bugs lying in wait. Strings in Python were not added as an afterthought or tacked 
on via a third party library, but are part of the core language itself, and it shows! 






Chapter 3 4 - Expressions and Strings 3 5 


String literais 

A string literal is a sequence of characters enclosed by a matching pair of single or 
double quotes: 

"Do you 1 ike green eggs and ham?" 

'Atnu vian najbaron’ 

"Tuesday' # Illegal: quotes do not match. 

Which of the two you use is more of a personal preference (in some nerdy way I 
find single-quoted strings more sexy and “cool”), but sometimes the text of the 
string makes one or the other more convenient: 

'Quoth the Raven, _Nevermore' 

_Monty Python's Flying Circus_ 

_Enter your age (I’ll know if you're lying, so don’t): _ 

Python automatically joins two or more string literais separated only by whitespace: 

>>> "one" 'two' "three" 

' onetwothree' 

A single backslash character inside a string literal lets you break a string across 
multiple lines: 

>>> 'Rubber baby \ 

. . . buggy butnpers ' 

'Rubber baby buggy bumpers' 

If your string of text covers several lines and you want Python to preserve the exact 
formatting you used when typing it in, use triple-quoted strings (the string begins 
with three single or double quotes and ends with three more of the same type of 
quote). An example: 

>>> s = .Knock knock." 

. . . "Who's there?" 

... "Knock knock." 

. . . "Who's there?" 

... "Knock knock." 

... "Who's there?" 

... "Philip G1ass ." 

>>> print s 
"Knock knock." 

"Who's there?" 

"Knock knock." 

"Who's there?" 

"Knock knock." 

"Who's there?" 

"Philip G1 a s s ." 
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String length 

Regardless of the quoting method you use, string literals can be of any length. You 
can use the 1 e n ( x ) function to retrieve the length of a string: 

>>> 1 en('Pokey' ) 

5 

>>> s = 'Data: \x00\x01 ' 

>>> len(s) 

7 

Escape sequences 

You can also use escape sequences to include quotes or other characters inside a 
string (see Table 3-3): 

>>> print "\"Never!\" shouted Skeptopotamus." 

"Never!" shouted Skeptopotamus. 


Table 3-3 

Escape Sequences 

Sequence 

Description 

\n 

Newline (ASCII LF) 

\' 

Single quote 

\" 

Double quote 

W 

Backslash 

\t 

Tab (ASCII TAB) 

\b 

Backspace (ASCII BS) 

\r 

Carriage return (ASCII CR) 

\xhh 

Character with ASCII value hh in hex 

\ooo 

Character with ASCII value ooo in octal 

\f 

Form feed (ASCII FF)* 

\a 

Bell (ASCII BEL) 

\v 

Vertical tab (ASCII VT) 


* Not ali output devices support all ASCII codes. You won't use \v very often, for example. 


Table 3-3 lists the valid escape sequences. If you try to use an invalid escape 
sequence, Python leaves both the backslash and the character after it in the string: 

>>> print 'Time \z for foosball!' 

Ti me \z for foosbal1 ! 
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As shown in Table 3-3, you can specify the characters of a string using their ASCII 
value: 

>>> ' \x50\x79\x74\x68\x6f\x6e' 

' Python' 


Cross- A See "Converting Between Simple Types" later in this chapter for more on the ASCII 
Referen^ codes for characters. 

The values can be in the range of 0 to 255 (the values that a singie byte can have). 
Remember: a string in Python doesn’t have to be printable text. A string couid hold 
the raw data of an image file, a binary message received over a network, or any- 
thing eise. 


Raw strings 

One final way to specify string literais is with raw strings, in which backslashes can 
stili be used as escape characters, but Python leaves them in the string. You flag a 
string as a raw string with an r prefix. For example, on Windows Systems the path 
separator character is a backslash, so to use it in a string you’d normally have to 
type ‘W’ (the escape sequence for the backslash). Alternatively, you couid use a 
raw string: 

>>> s = r"c : \gatnes\hal f-1 ife\hl .exe" 

>>> s 

' c: WgatnesWhal f-1 i feWhl .exe' 

>>> print s 

c:\garries\half-life\hl .exe 

Cross- A The os. path module provides easy, cross-platform path manipulation. See 
Referen^ Chapter 10 for detaiis. 


Manipulating strings 

You can use the plus and multiply operators to build strings. The plus operator 
concatenatos strings together: 


>>> a = 'ha ' 
>>> a + a + a 
'ha ha ha ' 


The multiply operator repeats a string: 


>>> '=' * 10 
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Note that operator precedence rules apply, as always: 

>>> 'Wh' + 'e' * 10 +'! ' 

'Wheeeeeeeeee! ' 

Augmented assignment works as well: 

>>> a = 'Ah' 

>>> a += ' Hah! ' 

>>> a 
'Ah Hah! ' 

>>> a *= 2 
>>> a 

'Ah Hah! Ah Hah! ' 

Accessing individual characters and substrings 

Because strings are sequences of characters, you can use on them the same opera- 
tors that are common to all of Python’s sequence types, among them, subscription 
and slice. 



See Chapter 4 for a discussion of Python sequence types. 


Subscription lets you use an index number to retrieve a single character from a 
Python string, with 0 being the first character: 

>>> s = ' Python' 

»> s[l] 

'y ’ 

Additionally, you can reference characters from the end of the string using negative 
numbers. An index of -1 means the last character, -2 the next to last, and so on: 

>>> 'Hei 1 0 '[-1] 

' 0 ' 

>>> 'Hei 1 0 '[-5] 

'H' 

Python strings are immutable, which means you can’t directly change them or indi¬ 
vidual characters (you can, of course, assign the same variable to a new string): 

>>> s = 'Bad' 

>>> s[2] = 'c' # Can't modify the string value 

Traceback (innermost last): 

File "<interactive input>", line 1, in ? 

TypeError: object doesn't support item assignment 

>>> s = 'Good' # Can reassign the variable 
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Slicing is similar to subscription except that with it you can retrieve entire sub- 
strings instead of single characters. The operator takes two arguments for the 
lower and upper bounds of the slice: 

>>> 'Monty'[2:4] 

' nt' 

It’s important to understand that the bounds are not referring to character indices 
(as with subscription), but really refer to the spots between characters: 

Monty 

I I I I I I 

0 1 2 3 4 5 

So the slice of 2:4 is like telling Python, “Give me everything from the right of 2 and 
to the left of 4,” which is the substring “nt”. 

The lower and upper bounds of a slice are optional. If omitted, Python sticks in the 
beginning or ending bound of the string for you: 

>>> s = 'Monty' 

»> s[:2] 

' Mo' 

»> s[2:] 

' nty' 

>>> s[: ] 

'Monty' 

Don’t forget: Python doesnT care if you use negative numbers as bounds for the 
offset from the end of the string. Continuing the previous example: 

»> s[l:-l] 

' ont' 

»> s[-3:-l] 

' nt' 
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You can also access each character via tuple unpacking. This feature isn’t used as 
often because you have to use exactly the same number of variables as characters 
in the string: 



>>> a,b,c = 'YES' 

>>> print a, b, c 
YES 

Python does not have a separate 'character' data type; a character is just a string of 
length 1. 


Formatting strings 

The modulo operator (%) has special behavior when used with strings. You can use 
it like the C pri ntf function for formatting data: 

>>> "It’s %d past %d, %s!" % (7,9,"Fred") 

"It's 7 past 9, Fred!" 

Python scans the string for conversion specifiers and replaces them with values 
from the list you supply. Table 3-4 lists the different characters you can use in a 
conversion and what they do; those in bold are more commonly useful. 


Table 3-4 

String Formatting Characters 

Character 

Description 

d or I 

Decimal (base 10) integer 

f 

Floating point number 

S 

String or any object 

c 

Single character 

u 

Unsigned decimal integer 

X or X 

Hexadecimal integer (upper or lower case) 

0 

Octal integer 

e or E 

Floating point number in exponentiai form 

g or G 

Like 7of uniess exponent < -4 or greater than the 
precision. If so, acts like %e or %E 

r 

repr( ) version of the object* 

% Use %% to print the percentage character. 


* %s prints the str() version, %r prints the repr() version. See "Converting Between Simple Types" in this chapter. 
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Here are a few more examples: 

»> '%x %X' % (57005,48879) 

'dead BEEF' 

>>> pi = 3.14159 

>>> '%f %E %G' % (pi,pi ,pi) 

'3.141590 3.141590E+000 3.14159' 

>>> print '%s %r' % ('Hei 1o','Hei 1o ' ) 

Hei 1 0 'Hei 1o' 

Beyond these features, Python has several other options, some of which are 
holdovers from C. Between the % character and the conversion character you 
choose, you can have any combination of the following (in this order): 

Key name 

Instead of a tuple, you can provide a dictionary of values to use (dictionaries are 
covered in Chapter 4). Place the key names (enclosed in parentheses) between the 
percent sign and the type code in the format string. This one is best explained with 
an example (although fans of Mad-Libs will be at horne): 

>>> d = { ' name' : ' Satn' , 'num':32, 'amt':10.12) 

>>> '%(name)s is %(num)d years old. %(name)s has $%(amt).2f' % 
d 

'Sam is 32 years old. Sam has $10.12' 

-orO 

A minus indicates that numbers should be left justified, and a 0 telis Python to pad 
the number with leading zeros. (This won’t have much effect unless used with the 
minimum field modifier, explained below.) 

+ 

A plus indicates that the number should always display its sign, even if the number 
is positive: 

»> '%+d %+d' % (5,-5) 

'+5 -5' 

Minimum field width number 

A number indicates the minimum field this value should take up. If printing the 
value takes up less space, Python adds padding (either spaces or zeros, see above) 
to make up the difference: 

>>> '%5d' % 2 # Don't need O if there's only one value 

2' 

»> '%-5d, %05d' % (2,2) 

'2 , 00002' 
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Additional precision-ish number 

This final number is a period character followed by a number. For a string, the 
number is the maximum number of characters to print. For a floating-point number, 
it’s the number of digits to print after the decimal point, and for integers it’s the 
minimum number of digits to print. Got all that? 

>>> '%.3s' % 'Python' 

' Pyt' 

>>> '%05.3f' % 3.5 
'3.500' 

»> '%-8.5d' % 10 
'00010 

Last but not least, you can use an asterisk in place of any number in a width field. If 
you supply an asterisk, you also provide a list of values (instead of a single num¬ 
ber). Python looks in the list of values for that width value: 

»> % (6,3,1.41421356) 

' 1.414' 

Comparing strings 

String comparison works much the same way numeric comparison does by using 
the Standard comparison operators (< , <=, ! = , ==, >=, >). The comparison is 
lexicographic (‘A’ < ‘B’) and case-sensitive: 

>>> 'Fortran' > 'Pascal ' 

0 

>>> 'Perl ' < 'Python ' 

1 

For a string in an expression, Python evaluates any nonempty string to true, and an 


empty string to false: 

>>> 

'OK' 

and 5 

5 



>>> 

not 

' fun' 

0 



>>> 

not 

' ' 


1 

This behavior provides a useful idiom for using a default value if a string is empty. 
For example, suppose that the variable s in the following example came from user 
input instead of you supplying the value. If the user chose something, na me holds 
its value; otherwise name holds the default value of ' i ndex. html '. 

>>> s = ''; name = s or 'index.html' 

>>> name 
'index.html ' 
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>>> s = ' page . html’; name = s or ’ index. html' 

>>> name 
'page.html ' 

You can use the mi n, max, and cmp functions on strings: 

>>> min('abstract') # Find the least character in the string. 

’ a ’ 

>>> max('i’,'1 ove’,'spam') # Find the greatest string. 

’spam' 

>>> cmp('VaderMaul' ) # Vader is greater. 

9 

Strings (and other sequence types) also have the i n (and not in) operator, which 
tests if a character is a member of a string: 

>>> 'u' in 'there?' 

0 

>>> 'i' not in 'teamwork' # Cheesy 
1 

Chapter 9 covers advanced string searching and matching with regular expressions. 



Unicode string literais 

Many computer languages limit characters in a string to values in the range of 0 to 
255 because they store each one as a single byte, making nearly impossible the sup- 
port of non-ASCn characters used by so many other languages besides plain old 
Engllsh. Unicode characters are 16-bit values (0 to 65535) and can therefore handle 
just about any character set imaginable. 

New Full support for Unicode strings was a new addition in Python 2.0. 

Feature 


You can specify a Unicode literal string by prefixing a string with a u: 

>>> u'Rang' 
u'Rang' 

See Chapter 9 for more on using Unicode strings. 

Converting Between Simple Types 

Python provides many functions for converting between numerical and string data 
types in addition to the string formatting feature in the previous section. 
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Converting to numerical types 

The i nt, 1 ong, f 1 oat, compl ex, and ord functions convert data to numerical types. 


int (x[, radix]) 

This function uses a string and an optional base to convert a number or string to an 
integer: 

>>> int('15') 

15 

>>> int('15',16) # In hexadecimal, sixteen is written "10" 

21 

The string it converts from must be a valid integer (trying to convert the string 3.5 
would fail). Alternatively, the int function can convert other numbers to integers: 

>>> int(3.5) 

3 

>>> int(lOL) 

10 

The i nt function drops the fractional part of a number. To find the “closest” inte¬ 
ger, use the round function (below). 


long (x[, radix]) 

The long function can convert a string or another number to a long integer (you 
can also include a base): 

>>> long('125') 

125L 

>>> long(17.6) 

17L 

>>> long('lE',16) 

30L 


float (x) 

You should be seeing a pattern by now: 

»> float(12.1) 

12.1 

>>> float(lOL) 

10.0 

>>> int(f1oat ( "3.5" )) # int("3.5") is illegal. 
3 


The exception is with complex numbers; use the abs function to “convert” a 
complex number to a floating-polnt number. 
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round (num[, digits]) 

This function rounds a floating point number to a number having the specified 
number of fractional digits. If you omit the digits argument, the resuit is a whole 
number: 

>>> round(123.5678,3) 

123.568 

>>> round(123.5678) 

124.0 

>>> round(123.4) 

123.0 


complex (real[, imaginary]) 

The cotnpl ex function can convert a string or number to a complex number, and it 
also takes an optional imaginary part to use if none is supplied: 

>>> complex('2+5j') 

(2+5j) 

>>> complex('2' ) 

(2+Oj) 

>>> complex(6L,3) 

(6+3j) 


ord (ch) 

This function takes a single character (a string of length 1) as its argument and 
returns the ASCII or Unicode value for that character: 

>>> ord(u'a') 

97 

>>> ord('b') 

98 


Converting to strings 

Going the other dlrection, the following functlons take numbers and make them into 
strings. 


chr (x) and unichr (x) 

Inverses of the ord function, these functions take a number representing an ASCII 
or Unicode value and convert it to a character: 

>>> chr(98) 

' b' 
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oct (x) and hex (x) 

These two functions take numbers and convert them to octal and hexadecimal 
string representations: 

>>> oct(123) 

'0173' 

>>> hex(123) 

'0x7b' 

str (obj) 

The str function takes any object and returns a printable string version of that 
object: 

»> str(5) 

'5' 

»> str(5.5) 

'5.5' 

»> str(3+2j) 

'(3+2j)' 

Python calls this function when you use the pri nt statement. 

repr (obj) 

The repr function is similar to str except that it tries to return a string version of 
the object that is valid Python syntax. For simple data types, the outputs of str and 
repr are often identical. (See Chapter 9 for details.) 

A popular shorthand for this function is to surround the object to convert in back 
ticks (above the Tab key on most PC keyboards): 

>>> a = 5 

>>> 'Give me ' + a # Can't add a string and an integer! 

Traceback (innermost last): 

File "<interactive input>", line 1, in ? 

TypeError: cannot add type "int" to string 
>>> 'Give me ' + 'a' # Convert to a string onthefly. 

'Give me 5' 

As of Python 2.1, str and repr display newlines and other escape sequences the 
same way you type them (instead of displaying their ASCII code): 

>>> ' Hei 1o\nWorld' 

'Hei 1o\nWorld ' 

When you use the Python interpreter interactively, Python calls repr to display 
objects. You can have it use a different function by setting the value of sys . 

displayhook: 


New 

pMtiife 
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»> 5.3 

5.2999999999999998 # The Standard representation is ugly. 

>>> def printstrCs): 

print str(s ) 

>>> import sys 

>>> sys.displayhook = printstr 
»> 5.3 

5.3 # A more humanfriendly format 

New ^ The sys . di spl ayhook feature is new in Python 2.1. 

Feature 


Summary 

Python has a complete set of operators for building expressions as complex as you 
need. Python’s built-in string data type offers powerful but convenient control over 
text and binary strings, freeing you from many maintenance tasks you’d be stuck 
with in other programming languages. In this chapter you: 

4 Built string literals and formatted data in strings. 

Used Python’s operators to modify and compare data. 

4 Learned to convert between various data types and strings. 

in the next chapter youTl unleash the power of Python’s other built-in data types 
including lists, tuples, and dictionaries. 

4 4 4 



Advanced 
Data Types 

T he simple data types in the last few chapters are com- 
mon to many programming languages, although often not 
so easily managed and out-of-the-box powerful. The data 
types in this chapter, however, set Python apart from lan¬ 
guages such as C, C++, or even Java, hecause they are built-in, 
intuitive and easy to use, and incredibly powerful. 


Grouping Data with Sequentes 

Strings, lists, and tuples are Python’s built-in sequence data 
types. Each sequence type represents an ordered set of data 
elements. Unlike strings, where each piece of data is a single 
character, the elements that make up a list or a tuple can be 
anything, including other lists, tuples, strings, and so on. 
Though much of this section applies to strings, the focus here 
is on lists and tuples. 

Cross- ^ Go directiy to Chapter 3 to leam more about strings. Do 
_ \ not pass Go. 

The main difference between lists and tuples is one of muta- 
bility: you can change, add, or remove items of a list, but you 
cannot change a tuple. Beyond this, though, you will find a 
conceptual difference on where you apply each. You’d use a 
list as an array to hold the lines of text from a file, for exam- 
ple, and a tuple to represent a 3-D polnt in space (x,y,z). Put 
another way, lists are great for dealing with many items 
that you’d process similarly, while a tuple often represents 
different parts of a single item. (Don’t worry—when you go to 
use either in a program it becomes pretty obvious which one 
you need.) 



> ♦ ♦ ♦ 

In This Chapter 

Grouping data with 
sequences 

Working with 
sequences 

Using additional list 
object features 

Mapping information 
with dictionaries 

Understanding 
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Creating lists 


Creating a list is straightforward because you don’t need to specify a particular 
data type or length. You can surround any piece of data in square brackets to create 
a list containing that data: 

>>> X = [] # An empty list 
>>> y = [’Strawberry',’Peach ' ] 

>>> z = [10,'Howdy’ ,y] # Mixed types and a list within a list 

>>> z 

[10, 'Howdy', ['Strawberry', 'Peach']] 

You can call the 1 i st (seq ) function to convert from one sequence type to a list: 


>>> list((5,10)) # A tuple 
[5, 10] 

>>> 1 ist("The World") 

['T', 'h', 'e', ' ', 'W', 'o', 'r', '1', 'd'] 


If you call 1 i st on an object that is already a list, you get a copy of the original list 
back. 



See "Copying Complex Objects" in this chapter for more on copying objects. 


Ranges 


You use the range([lower,] stop[, step]) function to generate a list whose 
members are some ordered progression of integers. Instead of idling away your 
time typlng in the numbers from 0 to 10, you can do the same with a call to range: 

>>> range(lO) 

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] # 10 items, starting at 0 

You can also call the function with start and stop indices, and even a step to teli it 
how quickly to jump to the next item: 

>>> range(6,12) 

[6, 7, 8, 9, 10, 11] # Stops just before the stop index. 

>>> range (2,20,3) 

[2, 5, 8, 11, 14, 17] 

>>> range (20,2,-3) # Going down! 

[20, 17, 14, 11, 8, 5] 

You most commonly use the range function in looping (which we cover in the next 
chapter): 

>>> for i in range(lO): 


0123456789 


print i, 
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The xrange ([ 1 ower, ] stop[, step] ) function is similar to range except that 
instead of creating a list, it returns an xrange object that behaves like a list but 
doesn’t calculate each list value until needed. This feature has the potential to save 
memory if the range is very large or to improve performance If you aren’t likely to 
iterate through every single member of the equivalent list. 


List comprehensions 

One final way to create a list is through list comprehensions, which are great if you 
want to operate on each item in a list and store the resuit in a new list, or if you 
want to create a list that contains only items that meet certain crlteria. For 
example, to generate a list containing x^ for the numbers 1 tbrough 10: 

>>> [x*x for X in range(l,ll)] 

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100] 

/ 

New A List comprehensions are new in Python 2.0. 

Feature 


Python uses the range(l,ll) to generate a list containing the numbers 1 through 
10. Then, for each number in that list, it evaluates the expresslon x*x and adds the 
resuit to the output list. 

You can add an i f to the list comprehenslon so that items get added to the new list 
only if they pass some test. For example, to generate the same list as above while 
weeding out odd numbers: 

>>> [x*x for X in range(lO) if x % 2 == 0] 

[0, 4, 16, 36, 64] 

But wait, there’s more! You can list more than one for statement and Python evalu¬ 
ates each in order, processing the rest of the list comprehenslon each time: 

>>> [a+b for a in 'ABC' for b in '123'] 

['AI', 'A2', 'A3', 'Bl', 'B2', 'B3', 'Cl', 'C2', 'C3'] 

Python loops through each character of ' ABC ' and for each one goes through the 
entire loop of each character in ' 123'. 

See where this is going? You can have as many for statements as you want, and 
each one can have an i f statement (but if you think you need five or slx then you 
might want to break them into separate statements for sanity’s sake): 

>>> [a+b+c for a in "HI" for b in "JOE" if b != 'E' 
for c in '123' if c!= '2 ' ] 

['Hjr, 'HJ3', 'HOl', 'H03', 'IJl', 'IJ3', '101', '103'] 
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Finally, the expression that Python evaluates to generate each item in the new list 
doesn’t have to be a simple data type such as an integer. You can also have it be 
lists, tuples, and so forth: 

>>> [(x,ord(x)) for x in 'Ouch'] 

[('0' , 79), ( 'u' , 117), ( 'c' , 99), (' h’ , 104)] 

Creating tuples 

Creating a tuple is similar to creating a list, except that you use parentheses instead 
of square brackets: 

>>> X = () # Any empty tuple 

>>> y = 22407Fredericksburg ’ # ()'s are optional 
>>> z = ('Mrs. White',’Bal1room',’Candlestick') 

Parentheses can also enclose any expression, so Python has a special syntax to des¬ 
ignate a tuple with only one item. To create a tuple containing the string ‘lonely’: 

>>> X = (' 1onel y' , ) 

Use the tupl e ( seq ) function to convert one of the other sequence types to a tuple: 

>>> tuple(’tuple') 

('t', 'u', 'p', '1', 'e') 

>>> tuple([1,2,3]) 

(1, 2, 3) 


Working with Sequences 

Now that you have your list or tuple, what do you do with it? This section shows 
you the operators and functions you can use to work on sequence data. 


Joining and repeating with arithmetic operators 

Of the arithmetic operators, Python defines addition and multiplication for working 
with sequences. As with strings, the addition operator concatenatos sequences and 
the multiplication operator repeats them: 

>>> [1,2] + [5] + ['EGBDF'] 

[1, 2, 5, 'EGBDF'] 

»> ( ' FACEG' , ) + (17,88) 

('FACEG', 17, 88) 

>>> (l,3+4j) * 2 
(1, (3+4j), 1, (3+4j)) 
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The augmented assignment version of these operators works as well (although for 
strings and tuples Python doesnT perform the operation in place but instead cre- 
ates a new object): 


>>> 

z 

= [' bow' , 

' arrow'] 

>>> 

z 

*= 2 


>>> 

z 



[' bow' 

' , ' arrow' 

, 'bow' , 

>>> 

q 

= (1,2) 


>>> 

q 

+= (3,4) 


>>> 

q 



(1, 

2, 

, 3, 4) 



Comparing and membership testing 

You can use the normal comparison (<, <=, >=, >) and equality (! =, ==) operators 
with sequence objects: 

>>> [’fi ve','two'] != [5,2] 

1 

»> (0.5,2) < (0.5,1) 

0 


Python checks the corresponding element of each sequence until it can make a 
determination. When the items in two sequence objects are equal except that one 
has more items than the other, the longer is consldered greater: 

»> [1,2,3] > [1,2] 

1 

You can use the i n operator to test If something is in a list or tuple, and not i n to 
test if it is not: 


>>> trouble = ('Dan','Joe', ' Bob ' ) 
>>> 'Bob' in trouble 
1 

>>> 'Dave' not in trouble 
1 


Accessing parts of sequences 

When you need to retrieve data from a sequence object, you have several 
alternatives. 


Subscription 

When you want to access a slngle element of a sequence object, you use the sub- 
script or index of the element you want to reference, with the first element having 
an index of zero (For some reason I get strange looks when 1 say, “Back to square 
zero!”): 
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>>> num = [' dek','dudek’,'tridek'] 

>>> nutn[l] 

’ dudek' 

>>> nurri[-l] # A negative index starts from the other end. 

'tridek' 


Slices 

Slices let you create a new sequence containing all or part of another sequence. You 
specify a slice in the form of [start:end] and for each element Python adds that 
element to the new sequence if its index i isstart<=i <end. 


Tip 



Conceptually, thinking of the slice parameters as pointing between items in a 
sequence is helpful. 


>>> meses = [’marzo',’abri1','mayo’,'junio' ] 

>>> tTieses[l:3] 

[ ’ abri1 ' , 'mayo ' ] 

>>> meses[0:-2] # Parameters can count from the right, too. 

['marzo ' , 'abri1 ' ] 


The start and end parameters are both optional, and Python silently corrects 
invalid input: 


>>> meses[2:] 

[ ’mayo', ’ junio'] 
>>> meses[:2] 

['marzo ' , 'abri1 ' ] 
>>> meses[-2:5000] 
[’mayo' , ’j unio ' ] 



See "Accessing individual characters and substrings" in Chapter 3 for more exam- 
ples of using slices. 


Unpacking 

Just as you can create a tuple by assigning a comma-separated list of items to a 
single variable, you can unpack a sequence object (not just tuples!) by doing the 
opposite: 


>>> s = 801,435,804 
>>> X,y,z = s 
>>> print x,y,z 
801 435 804 


Keep in mind that the number of variables on the left must match the length of the 
sequence you’re unpacking on the right. 



Multiple assignment (in Chapter 3) is really just a speciai case of tuple packing and 
unpacking: you pack the objects into a single tuple and then unpack them into the 
same number of original variables. 
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Iterating with for...in 

A common task is to loop over all the elements of a list or tuple and operate on 
each one. One of the easiest ways to do this is with a for ... i n statement: 

>>> for op in ['sin','cos ' , ' tan ' ]: 

print op 
si n 
cos 
tan 


Using sequence utility functions 

Python provides a rich complement of sequence Processing functions. 


len (x), min (x[, y,z,...]), and max (x[, y,z,...]) 

These three aren’t really specific to sequences, but they’re quite useful nonetheless: 

>>> data = [0.5, 12, 18, 2, -5] 

>>> len(data) # Count of items in the sequence 
5 

>>> tnin(data) # The minimum item in the sequence 

-5 

>>> max(data) # The maximum item in the sequence 
18 

filter (function, list) 

When you call fi 1 ter it applies a function to each item in a sequence, and returns 
all items for whlch the function returns true, thus filtering out all items for which 
the function returns false. In the following example 1 create a tiny function, 
nukeBad, that returns false if the string passed in contains the word ' bad'. 
Combining fi 1 ter with nukeBad eliminates all those ‘bad’ words: 

>>> def nukeBad(s): 

return s.find('bad') == -1 
>>> s = [' bad ' , ' good','Sinbad','bade','welcome'] 

>>> fi 1ter(nukeBad,s) 

[' good', ' welcome'] 

If you pass in None for the function argument, filter removes any 0 or empty 
items from the list: 

»> stuff = [12,0,'Hey',[]," ,[1,2]] 

>>> fi 1 ter(None,stuff) 

[12, 'Hey', [1, 2]] 

The filter function returns the same sequence type as the one you passed in. The 
example below removes any number characters from a string and returns a new 
string: 
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>>> fi 1ter(1ambda d:not d.isdigit(),"P6ythl2on") 
’Python' 

Cross- ^ See Chapter 6 for more information on lambda expressioris. 
Reference \\ 


map (function, list[, list,...]) 

The map function takes a function and a sequence and returns to you the resuit of 
applying the function to each item in the original sequence. Regardless of the type 
of sequence you pass in, map always returns a list: 

>>> import string 

>>> s = ['chi 1 e','canada','mexico'] 

>>> map(string.capitalize , s ) 

['Chile', 'Canada', 'Mexico'] 

You can pass in several multiple lists, too, as long as the function you supply takes 
the same number of arguments as the number of lists you pass in: 

>>> import operator 

>>> s = [2,3,4,51; t = [5,6,7,81 

>>> map(operator.mul,s,t) # s[j] * t[j] 

[10, 18, 28, 401 

Cross- Chapter 7 covers the operator class, which contains function versions of the 

Referenc^ Standard operators so you can pass them into functions like map. 

If the lists you use are of different lengths, map uses empty (None) items to make up 
the difference. Also, if you pass in None instead of a function, map combines the cor- 
responding elements from each sequence and returns them as tuples (compare this 
to the behavior of the zi p function, later in this sectlon): 

»> a = [1,2,31; b = [4,5,61; c = [7,8,91 
>>> map(None,a,b,c) 

[(1, 4, 7), (2, 5, 8), (3, 6, 9)1 

reduce (function, seq[, init]) 

This function takes the first two items in the sequence you pass in, passes them to 
the function you supply, takes the resuit and the next item in the list, passes them 
to the function, and so on until it has processed ali the items: 

>>> import operator 

>>> reducefoperator.mul,[2,3,4,51) 

120 # 120 = ((2*3)*4)*5 

An optional third parameter is an initializer reduce uses in the very first calcula- 
tion, or when the list is empty. The following example starts with the string and 
adds each character of a word to the beginning and end of the string (because 
strings are sequences, reduce calls the function once for each letter in the string): 
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>>> reduce(1ambda x,y: y+x+y, "Helio", 
'ol1eH-Hel1o' 


zip (seq[, seq,...]) 

The z i p function combines corresponding items from two or more sequences and 
returns them as a list of tuples, stopping after it has processed all the items in the 
shortest sequence: 

»> zip([l,l,2,3,5],[8,13,21]) 

[(1, 8), (1, 13), (2, 21)] 

You may find the z i p function convenient when you want to iterate over several 
lists in parallel: 

>>> names = ['Joe','Fred','Sam'] 

>>> exts = [116,120,100] 

>>> ages = [26,34,28] 

>>> for natne, ext, age in zi p(names ,exts , ages ): 

print '%s (extension %d) is %d' % (name,ext,age) 

Joe (extension 116) is 26 
Fred (extension 120) is 34 
Sam (extension 100) is 28 

Passing in just one sequence to zi p returns each item as a 1-tuple: 

»> zip((l,2,3,4)) 

[(1,), (2,), (3,), (4,)] 

New The zi p function was introduced in Python 2.0. 

Feature 


Using Additional List Object Features 

List objects have several methods that further facilitate their use, and because they 
are mutable they support a few extra operations. 

Additional operations 

You can replace the value of any item with an assignment statement: 

>>> todo = ['dishes','garbage','sweep','mow 1awn ' , ' dust' ] 

>>> todo[l] = 'boogie' 

>>> todo 

['dishes', 'boogie', 'sweep', 'mow lawn', 'dust'] 

What gets replaced in the list doesn’t need to be limited to a single item. You can 
choose to replace an entire slice with a new list: 
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>>> todo[l:3] = ['nap'] # Replace from 1 to before 3 

>>> todo 

['dishes', 'nap', 'tnow lawn', 'dust'] 

>>> todo[2:] = ['eat','drink','be merry'] 

>>> todo 

['dishes', 'nap', 'eat', 'drink', 'be merry'] 

And finally, you can delete items or slices using dei: 

>>> dei z[0] 

>>> z 

['nap', 'eat', 'drink', 'be merry'] 

»> dei z[l:3] 

>>> z 

[' nap ' , 'be merry ' ] 


List object methods 

The following methods are available on all list objects. 

append (obj) and extend (obj) 

The append method adds an item to the end of a list like the += operator (Python 
modifies the original list in place) except that the item you pass to append is not a 
list. The extend method assumes the argument you pass it is a list: 

>>> z = ['Nevada','Virgini a ' ] 

>>> z.appendi ' Utah ' ) 

>>> z 

['Nevada', 'Virginia', 'Utah'] 

>>> z.extendi['North Caroli na', ' Georgia ' ]) 

>>> z 

['Nevada', 'Virginia', 'Utah', 'North Carolina', 'Georgia'] 

index (obj) 

This method returns the index of the first matching item in the list, if present, and 
raises the ValueError exception if not. Continuing the previous example: 

>>> x.indexil2) 

1 

>>> try: print x.indexi'Farmer') 

... except ValueError: print 'NOT ON LIST!' 

NOT ON LIST! 

See the next chapter for Information on try. . . excepti on blocks. 



count (obj) 

You use the count method to find out how many items in the list match the one you 
pass in: 


Chapter 4 -4- Advanced Data Types 59 


»> X = [15,12, 'Foo' ,16,12] 
>>> x.count(12) 

2 



j- Cross- \ String objects also have count and index methods. See Chapter 9 for detaiis. 
ReferenceA 


insert (j# obj) 

Use the insert method to add anewitem anywhere in the list. Pass in the indexof 
the item you want the new one to come before and the item to insert: 

>>> months = ['March','May','June ' ] 

>>> months.insert(1,'Apri 1 ' ) 

>>> months 

['March', 'April', 'May', 'June'] 

Notice that i nsert is pretty forgiving if you pass in a bogus index: 

>>> months.insert(-1,'February' ) # Item added at start 
>>> months. i nsert (5000,' July ' ) # Item added at end 
>>> months 

['February', 'March', 'April', 'May', 'June', 'July'] 

remove (obj) 

This function locates the first occurrence of an item in the list and removes it, if 
present, and yells at you if not: 

>>> months.remove('March' ) 

>>> months 

['February', 'February', 'April', 'May', 'June', 'July'] 

>>> months.remove('August' ) 

Traceback (innermost last): 

File "<interactive input>", line 1, in ? 

ValueError: 1 i st.remove(x): x not in list 


pop([|]) 


If you specify an index, pop removes the item from that place in the list and returns 
it. Without an index, the pop function removes and returns the last item from the 
list: 

>>> saludos = ['Hasta!','Ciao','Nos vernos'] 

>>> sal udos.pop(1) 

' C i a 0 ' 

>>> saludos 

['Hasta ! ' , 'Nos vernos ' ] 

>>> sal udos . pop() 

'Nos vernos' 
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Calling p 0 p on an empty list causes it to raise IndexError. 

reverseO 

As named, the reverse function reverses the order of the list: 

>>> natnes = [' Jacob ' , ' Hannah ' , ’ Rachael ’ , ' Jennie ' ] 

>>> natnes . reverse () 

>>> natnes 

['Jennie', 'Rachael', 'Hannah', 'Jacob'] 

sort([func]) 

This function orders the items in a list. Continuing the previous example: 

>>> natnes . sort() 

>>> natnes 

['Hannah', 'Jacob', 'Jennie', 'Rachael'] 

Additionally, you can provide your own comparison function to use during the sort. 
This function accepts two arguments and returns a negative number, 0, or a posi¬ 
tive number if the first argument is less than, equal to, or greater than the second. 
For example, to order a list by length of each item: 

>>> natnes .sort( 1 atnbda a ,b : 1 en(a )- 1 en(b)) # Ch 5 covers lambdas. 
>>> natnes 

['Jacob', 'Hannah', 'Jennie', 'Rachael'] 

Tip If you want to add and remove items to a sorted list, use the bi sect module. 

When you insert an item using the i nsort (1 i st, item) function, it uses a bisec- 
tion algorithm to inexpensively find the correct place to insert the item so that the 
resulting list remains sorted. The bi sect (list, item) function in the same 
module finds the correct insertion point without actually adding the item to the list. 


Mapping Information with Dictionaries 

A dictionary contains a set of mappings between unique keys and their values; they 
are Python’s only built-in mapping data type. The examples in this section use the 
following dictionary that maps login user names and passwords to Web site names 
(who can ever keep track of them all?): 

>>> logins = {'yahoo':('john','jyahooohn' ), 

'hotmail':('jrf5','18thStreet')} 

>>> 1 ogi ns[' hotmai 1 ' ] # Whafs my name/password for hotmail? 

('j rf5' , 'ISthStreet' ) 
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Creating and adding to dictionaries 

You create a dictionary by listing zero or more key-value pairs within curly braces. 
The keys used in a dictionary must be unique and immutable, so strings, numbers, 
and tuples with immutable items in them can all be used as keys. The values in the 
key-value pair can be anything, even other dictionaries if you want. 

Adding or replacing mappings is easy: 

>>> 1 ogi ns[' sl ashdot' ] = (’ juan ' , ’ 1 etntnei n ' ) 


Accessing and updating dictionary mappings 

If you try to use a key that doesnT exist in the dictionary, Python barks out a 
Key Error exception. When you don’t want to worry about handling the exception, 
you can instead use the get ( key [, ob j ] ) method, which returns None if the 
mapping doesnT exist, and even lets you specify a default value for such cases: 

>>> 1 ogins['sourceforge','No such login'] 

Traceback (innermost last): 

File "<interactive input>", line 1, in ? 

KeyError: ('sourceforge', 'No such login') 

>>> 1ogins.get('sourceforge') == None 
1 

>>> 1ogins.get('sourceforge','No such login') 

'No such login' 

The setdefaul t( key[, obj ]) method works like get with the default parameter, 
except that if the key-value pair doesn’t exist, Python adds it to the dictionary: 

>>> logins.setdefault('slashdot' ,('ji tntny' , ' punk' )) 

('juan', ' 1 etntnei n ' ) # Existing item returned 

>>> logins.setdefault('justwhispers',('jimmy','punk' )) 

('jimmy', 'punk') # New item returned AND added to dictionary 

If you just want to know if a dictionary has a particular key-value pair (or if you 
want to check before requesting it), you can use the has_key ( key) method: 

>>> 1 ogi ns.has_key('yahoo ' ) 

1 

The dei statement removes an item from a dictionary: 

>>> dei 1ogins['yahoo'] 

>>> 1ogins.has_key('yahoo') 

0 
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"Hashability" 


The more precise requirement of a dictionary key is that it must be hashable. An objecfs 
hash value is a semi-unique, internally generated number that can be used for quick com- 
parisons. Consider comparing two strings, for example. To see if the strings are equal, you 
wouid have to compare each character untii one differed. If you aiready had the hash value 
for each string, however, you couid just compare the two and be done. 

Python uses hash values in dictionary lookups for the same reason: so that dictionary 
lookups will not be too costly. 

You can retrieve the hash value of any hashable object by using the hash (obj ) function: 

>>> hash( ' hash' ) 

-1671425852 
>>> hash(lO) 

10 

>>> hash(lO.O) # Numbers of different types have the same hash. 

10 

»> hash((l,2,3)) 

-821448277 

The hash function raises the TypeError exception on unhashable objects (lists, for example). 


You can usetheupdate (dict) method to add the items from one dictionary to 
another: 

»> z = {) 

>>> z['slashdot' ] = ('fred','fred ' ) 

>>> z.update (logins) 

>>> z 

{’justwhispers ’ : ('jimmy', 'punk'), 

'slashdot': ('juan', 'lemmein'), # Duplicate key overwritten 
'hotmail': ('jrf5', '18thStreet' )} 

Additional dictionary operations 

Here are a few other functions and methods of dictionaries that are straightforward 
and useful: 

>>> len(logins) # How many items? 

3 

>>> 1ogins.keys() # List the keys of the mappings 
['justwhispers ' , 'slashdot', 'hotmail'] 

>>> 1ogins.values() # List the other half of the mappings 
[('jimmy', 'punk'), ('juan', 'lemmein'), ('jrf5', 

' 18thStreet')] 

>>> 1ogins.items() # Both pieces together as tuples 
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[ (' justwhi spers ' , ('jimtny', 'punk')), ( ' sl ashdot' , ('juan', 
’ 1 etnmei n ' )), (' hottnai 1 ’ , (’jrf5', ' ISthStreet' )) ] 

>>> 1ogins.clear() # Delete everything 
>>> logins 


You can destructively iterate through a dictionary by calling its popi tetn( ) method, 
which removes a random key and its value from the dictionary: 

>>> d = {’one':l, ’two':2, 'three':3) 

>>> try: 

w h i 1 e 1: 

pri nt d .popiterri() 

... except KeyError: # Raises KeyError when empty 

pass 
(' one ' , 1) 

('three ' , 3) 

('two', 2) 

New popi tem is new in Python 2.1. 


Dictionary objects also provide a copy () method that creates a shallow copy of the 
dictionary: 


>>> a = {l:'one', 2:'two', 3:’three') 
>>> b = a.copy() 

>>> b 

{3: 'three', 2: 'two', 1: 'one') 



See "Copying Complex Objects" later in this chapter for a comparison of shallow 
and deep copies. 


Understanding References 

Python Stores any piece of data in an object, and variables are merely references to 
an object; they are names for a particular spot in the computer’s memory. Ali 
objects have a unlque identity number, a type, and a value. 

Object identity 

Because the object, and not the variable, has the data type (for example, integer), a 
variable can reference a list at one moment and a floating-point number the next. 

An objects type can never change, but for lists and other mutable types its value 
can change. 
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Python provides the i d (obj ) function to retrieve an objecfs identity (which, in the 
current implementation, is just the objecfs address in memory): 

>>> shoppingList = [' candy' , ' cooki es ' , ' i ce creatn'] 

>>> id(shoppingList) 

17611492 
>>> id(5) 

3114676 

The i s operator compares the identities of two objects to see if they are the same: 

>>> junkFood = shoppingList # Both reference the same object 
>>> junkFood is shoppingList 
1 

>>> yummyStuff = ['candy',’cookies’,'i ce cream’] 

>>> junkFood is not yummyStuff # Different identity, but... 

1 

>>> junkFood == yummyStuff # ...same value 
1 

Because variables just reference objects, a change in a mutable objecfs value is 
visible to all variables referencing that object: 

»> a = [1,2,3,41 
>>> b = a 
>>> a[2] = 5 
>>> b 

[1, 2, 5, 4] 

>>> a = 6 

>>> b = a # Reference the same object for now. 

>>> b 
6 

>>> a = a + 1 # Python creates a new object to hold (a+1) 

>>> b # so b stili references the original object. 

6 

Counting references 

Each object also contains a reference counf tbat telis bow many variables are cur- 
rently referencing that object. Wben you assign a variable to an object or wben you 
make an object a member of a list or otber Container, tbe reference count goes up. 
When you destroy, reassign, or remove an object from a Container tbe reference 
count goes down. If the reference count reaches zero (no variables reference tbis 
object), Pytbon’s garbage collector destroys the object and reclaims tbe memory it 
was using. 

The sys.getrefcount(obj) function returns the reference count for the given 
object. 
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f Cross- % 
I Referencei 


See Chapter 26 for more on Python's garbage collector. 


New As of version 2.0, Python now also collects objects with oniy circular references. 

For example, 

a = []; b = [] 
a .append(b); b.append(a) 

a = 5; b = 10 # Reassign both vari abi es to different 
objects. 

The two list objects stili have a reference count of 1 because each is a member of 
the other's list. Python now recognizes such cases and reclaims the memory used 
by the list objects. 


Keep in mind that the dei statement deletes a variable and not an object, although 
if the variable you delete was the last to reference an object then Python may end 
up deleting the object too: 


»> a = [1,2,3] 

>>> b = a # List object has 2 references now 
>>> dei a # Back to 1 reference 

>>> b 
[1, 2, 3] 



You can also create weak references to objects, or references that do not affect an 
objects reference count. See Chapter 7 for more Information. 


Copying Complex Objects 

Assigning a variable to a list object creates a reference to the list, but what if you 
want to create a copy of the list? Python enables you to make two different types of 
copies, depending on what you need to do. 

Shallow copies 

A shallow copy of a list or other Container object makes a copy of the object itself 
but creates references to the objects contained by the list. An easy way to make a 
shallow copy of a sequence is by requesting a slice of the entire object: 

>>> faceCards = ['A','K','Q' , ' J ' ] 

>>> myHand = faceCards[:] # Create a copy, not a reference 
>>> myHand is faceCards 
0 

>>> myHand == 

1 


faceCards 


66 Part I -f The Python Language 


You can also use the copy (obj ) function of the copy module: 

>>> import copy 

>>> highCards = copy.copy(faceCards) 

>>> highCards is faceCards, highCards == faceCards 

( 0 , 1 ) 


Deep copies 

A deep copy makes a copy of the Container object and recursively makes copies of 
all the children objects. For example, consider the case when a list contains a list. A 
shallow copy of the parent list would contain a reference to the child list, not a sep¬ 
arate copy. As a resuit, changes to the inner list would be visible from both copies 
of the parent list: 

>>> myAccount = [1000, ['Checking','Savings']] 

>>> yourAccount = myAccountf:] 

>>> myAccount[l]. retnovef' Savi ngs ' ) # Modi fy the child list. 

>>> myAccount 

[1000, ['Checking’] ] # Different parent objects share a 
>>> yourAccount # reference to the same child list. 

[1000, ['Checking']] 

Now look at the same example by using the deepcopy (ob j ) function in the copy 
module: 

>>> myAccount = [1000, ['CheckingSavings']] 

>>> yourAccount = copy.deepcopy(myAccount) 

>>> myAccount[1].removeC'Savings') 

>>> myAccount 

[1000, ['Checking']] # deepcopy copied the child list too. 

>>> yourAccount 

[1000, ['Checking', 'Savings']] 

The deepcopy function tracks which objects it copied so that if an object directly 
or indirectly references itself, deepcopy makes only one copy of that object. 

Not all objects can be copied safely. For example, copying a socket that has an open 
connection to a remote computer won’t work because part of the objects Internal 
state (the open connection) is outside the realms of Python. File objects are 
another example of forbldden copy territory, and Python lets you know: 

f = open('f 00 ','wt') 

>>> copy.deepcopy(f) 

Traceback (innermost last): 

File "<interactive input>", line 1, in ? 

File "D:\Python20\lib\copy.py", line 147, in deepcopy 
raise error, \ 

Error: un-deep-copyable object of type <type 'file'> 
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chapter 7 shows you how to override Standard behaviors on classes you create. By 

defining your own_ getstate _and_ setstate _methods you can controi 

how your objects respond to shallow and deep copy operations. 


Identifying Data Types 

You can check the data type of any object at runtime, enabling your programs to 
correctly handle different types of data (for example, think of the i nt function that 
Works when you pass it an integer, a float, a string, and so on). You can retrieve the 
type of any object by passing the object to the type ( ob j ) function: 

>>> type(5) 

<type 'int'> 

>>> type('She sells seashells') 

<type 'string'> 

>>> type(operator) 

<type 'tnodule’> 

The types module contains the type objects for Python’s built-in data types. The 
following example creates a function that prints a list of words in uppercase. To 
make it more convenient to use, the function accepts either a single string or a list 
of strings: 

>>> import types 
>>> def upEtnCwords): 

if type(words) != types.ListType: # Not a list so 
words = [words] # make it a list. 

for word in words : 

print word . upper() 

>>> upEm('horse ' ) 

HORSE 

>>> upEm(['horse','cow','sheep']) 

HORSE 

COW 

SHEEP 

The following list shows a few of the more common types you’ll use. 

BuiltinFunctionType 

FunctionType 

MethodType 

Bui1tinMethodType 

InstanceType 

ModuleType 

C1assType 
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IntType 
NoneType 
DictType 
LatnbdaType 
StringType 
Fi 1eType 
ListType 
TupleType 
FIoatType 
LongType 

Classes and instances of classes have the types C1 assType and InstanceType, 
respectively. Python provides the i si nstance( obj ) and issubclass(obj) func- 
tions to test if an object is an instance or a subclass of a particular type: 

>>> i sinstance(5.1,types.FIoatType) 

1 

>>> class Foo: 

pass 

>>> a = FooO 

>>> i sinstancefa,Foo) 

1 

Chapter 7 covers creating and using classes and objects. 


Working with Array Objects 

While lists are flexible in that they let you store any type of data in them, that flexi- 
bility comes at a cost of more memory and a little less performance. In most cases, 
this isn’t an issue, but in cases wbere you want to exchange a little flexiblllty for 
performance or low level access, you can use the array module to create an array 
object. 

Creating arrays 

An array object is similar to a list except that it can hold only certain types of sim¬ 
ple data and only one type at any given time. When you create an array object, you 
specify wbich type of data it will hold: 

>>> import array 

>>> z = array.array ('B') # Create an array of bytes 
>>> z.append(5) 
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»> z[0] 

5 

>>> q = array.array('i’,[5,10 , -12,13]) # Optional initializer 
>>> q 

array('i', [5, 10, -12, 13]) 

Table 4-1 lists the type code you use to create each type of array. You can retrieve 
the size of items and the type code of an array object using its i tems i ze and 
typecode members. 


Table 4-1 

Array Type Codes 

Code 

Equivalent C Type 

Minimum Size in Bytes* 

c 

char 

1 

b(B) 

byte (unsigned byte) 

1 

h(H) 

short (unsigned short) 

2 

■(1) 

int (unsigned int) 

2 

1(L) 

long (unsigned long) 

4 

f 

float 

4 

d 

double 

8 


* Actual size may be greater, depending on the implementation. 


Converting between types 

Array objects have built-in support for converting to and from lists and strings, and 
for reading and writing with files. The following examples all deal with an array 
object of two-byte short integers initially containing the numbers 10, 1000, and 500: 

>>> z = array.array('h',[10,1000,500]) 

>>> z.itemsize 
2 

Lists 

The tol i st () method converts the array to an ordinary list: 

>>> z .tolist() 

[10, 1000, 500] 
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The f rotnl i st (1 i st ) method appends items from a normal list to the end of the 
array: 

>>> z.fromlist([2,4]) 

>>> z 

arrayCh' , [10, 1000, 500, 2, 4]) 

If any item in the list to add is of an incorrect type, f rotnl i st adds none of the 
items to the array object. 

Strings 

You can convert an array to a sequence of bytes using the t os t r i n g ( ) method: 

>>> z.tostring() 

’ \n\x00\xe8\x03\xf4\x01\x02\x00\x04\x00' 

>>> 1 en(z.tostring()) 

6 #3 items, 2 bytes each 

The fromstring(str) method goes in the other direction, taking a string of bytes 
and converting them to values for the array: 

>>> z.fromstringC’\xl0\x00\x00\x02') # xlO = 16, x0200 = 512 
>>> z 

arrayCh', [10, 1000, 500, 2, 4, 16, 512]) 


Files 

The tofi 1 e (fi 1 e) method converts the array to a sequence of bytes (just like 
tostri ng) and writes the resulting bytes to a file you pass in: 

>>> z = array.array('h',[10,1000,500]) 

>>> f = open('myarray','wb') # Chapter 8 covers files. 

>>> z.tofile(f) 

>>> f.closef) 

ThefromfileCfile, count) method reads the specified number of items in from 
a file object and appends them to the array. Continuing the previous example: 

>>> z.fromfi 1 e(open('myarrayrb'),3) # Read 3 items. 

>>> z 

arrayCh', [10, 1000, 500, 10, 1000, 500]) 

If the file ends before reading in the number of items you requested, fromf i 1 e raises 
the EOFError exception, but stili adds as many valid items as it could to the array. 

Cross- Y The marshal, pickle, and struet modules all provide additional —and often 
Referenc^ better _ methods for converting to and from sequences of bytes for use in files 
— and network messages. See Chapter 12 for more. 
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Array methods and operations 

Array objects support many of the same functions and methods of lists: 1 en, 
append, extend, count, i ndex, i nsert, pop, remove, and reverse. You can access 
individual members with subscription, and you can use slicing to return a smaller 
portion of the array (although it returns another array object and not a list). 

The buffer_i nfo () method returns some low-level Information about the current 
array. The returned tuple contains the memory address of the buffer and the length 
in bytes of the buffer. This information is valid until you destroy the array or it 
changes length. 

You can use the by teswap () method to change the byte order of each item in the 
array, which is useful for converting between big-endian and little-endian data: 

>>> z = array.array('I',[1,2,3]) 

>>> z.byteswap() 

>>> z 

arrayCr, [16777216L, 33554432L, 50331648L]) 

r Cross- Y See Chapter 12 for information on cross-platform byte ordering. 

Reference \ 

j-Cross- ^ NumPy (Numeric Python) is a Python extension that you can also use to create 
1 arrays, but it has much better support for using the resulting arrays in calculations. 

1—See Chapter 31 for more information on NumPy. 


Summary 

Python provides several powerful and easy-to-use data types that simplify working 
with different types of data. In this chapter you: 

-f Learned the differences between Python’s sequence types. 

Organized data with lists, sequences, and dictionaries. 

Created shallow and deep copies of complex objects. 

-f Used an objecfs type to handle it appropriately. 

Built array objects to hold homogenous data. 

The next chapter shows you how to expand your programs to include loops and 
decisions and how to catch errors with exceptions. 

■f ♦ -f 



ControI Flow 


c 


A program is more than simply a list of actions. A program 
can perform an action several times (with for- and while- 
loops), handle various cases (with if-statements), and cope 
with prohlems along the way (with exceptions). 

This chapter explains how to control the flow of execution in 
Python. A simple Game of Life program illustrates these tech- 
niques in practice. 

Making Decisions 
with If-Statements 

The if-statement evaluates a conditional expression. If the 
expression is true, the program executes the if-block. For 
example: 

if (CustotnerAge>55): 

print "You get a senior citizen's discount!" 

An if-statement may have an else-hlock. If the expression is 
false, the else-block (if any) executes. This code block prints 
one greeting for Bob, and another for everyone else: 

if (UserNatne=="Bob"): 

print "Greetings, 0 supreme commander!" 

el se: 

print "Helio, humble peasant." 

An if-statement may have one or more elif-blocks (“elif” is 
shorter to type than “else if” and has the same effect). When 
Python encounters such a statement, it evaluates the if- 
expression, then the first elif-expression, and so on, until one 
of the expressions evaluates to true. Then, Python executes 
the corresponding block of code. 

When Python executes an if-statement, it executes no more 
than one block of code. (If there is an else-block, then exactly 
one block of code gets executed.) 



> ♦ ♦ ♦ 

In This Chapter 

Making decisions 
with if-statements 

Using for-loops 

Using while-loops 

Throwing and 
catching exceptions 

Debugging with 
assertions 


Example: Game 
of Life 

♦ > ♦ ♦ 
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Listing 5-1 is a sample script that uses an if-statement (shown in both italics and 
bold) in a simple number-guessing game. 


Listing 5-1: NumberGuess.py 


import random 
import sys 

# This line chooses a random integer >=1 and <=100. 

# (See Chapter 15 for a proper explanation.) 

SecretNumber=random.randint(1,100) 

print "I'm thinking of a number between 1 and 100." 

# Loop forever (at least until the user hits CtrlBreak). 

whi1 e (1): 

print "Guess my number." 

# The following line reads a line of input from 

# the commandline and converts it to an integer. 

NumberGuess=int(sys.stdin.readline()) 
if (NumberGuess==SecretNumber): 

print "Correcti Choosing a new number..." 
SecretNumber=random.randint(l,100) 
elif (NumberGuess > SecretNumber): 

print "Lower." 
else: 

print "Higher." 


You can use many elif clauses; the usual way to write Python code that handles five 
different cases is with an if-elif-elif-elif-else statement. (Veterans of C and Java, take 
note: Python does not have a swi tch statement.) 

Note Python stops checking if-expressions as soon as it finds a true one. If you write an 

if-statement to handie several different cases, consider putting the most common 
and/or cheapest-to-check cases first in order to make your program faster. 


Using For-Loops 

For-loops let your program do something several times. In addition, you can iterate 
over elements of a sequence with a for-loop. 

Anatomy of a for-loop 

A simple for statement has the following syntax: 
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for <variable> in <sequence>: 

(loop body) 

The statement (or block) following the for statement forms the body of the loop. 
Python executes the body once for each element of the sequence. The loop variable 
takes on each elemenfs value, in order, from first to last. For instance: 

for Word in ["serious","si 1ly","slinky" ]: 

print "The mini ster's cat is a "+Word+" cat." 

The body of a loop can be a single statement on the same line as the for-statement: 

for Name in ["Tom","Dick","Harry"]: print Name 

Some people (myself included) usually stick with the first style, because all-on-one-llne 
loops can lead to long and tricky lines of code. 

Python can loop over any sequence type — even a string. If the sequence is empty, 
the loop body never executes. 

Looping example: encoding strings 

Listing 5-2 uses for-loops to convert strings to a list of hexadecimal values, and 
back again. The encoded strings look somewhat similar to the “decoder rings” 
popular on old children’s radio programs. 


Listing 5-2: DecoderRing.py 


import string 

def Encode(MessageString): 

EncodedList=[] 

# Iterate over each character in the string 

for Char in MessageString: 

EncodedList.appendf"%x" % ord(Char)) 
return EncodedList 

def Decode(SecretMessage): 

DecodedList=[] 

# Iterate over each element in the list 
for HexValue in SecretMessage: 

# The following line converts HexValue from 

# a hex string to an integer, then finds the ASCII 

# Symbol for that integer, and finally adds that 
character to the list. 

# Don't try this at horne! :) 

DecodedList.appendf chr(int(HexVal ue,16))) 


Continued 
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Listing 5-2 (continued) 


# Join these strings together, with no separator. 

return string.join(DecodedList,"") 

if (_natne_=="_main_"): 

SecretMessage = Encode( "Remetnber to drink your Ovaltine!") 

print SecretMessage 

print Decode(SecretMessage) 


Listing 5-3: DecoderRing.py output 


['52', '65' 
'6f', '20', 
'75', '72', 
'65', '21'] 


'6d', '65' 
64', '72', 
20', '4f', 


'6d', '62' 
69', '6e', 
76', '61', 


'65', '72' 
6b', '20', 
6c', '74', 


'20', '74' 
79', '6f', 
69', '6e', 


Remetnber to drink your Ovaltine! 


Ranges and xranges 

Many loops do something a fixed number of times. To iterate over a range of 
numbers, use range. For example: 

# print 10 numbers (from 0 to 9) 
for X in range(lO): 
print X 

The function range returns a list of numbers that you can use anywhere (not just in 
a loop). The syntax is: range(start[,end[,step]]). The numbers in the range 
begin with start, increment by step each time, and stop just before end. Both start and 
step are optional; by default, a range starts at 0 and increments by 1. For example: 

>>> range(10,0,-1) # Countdown! 

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1] 

>>> range(5,10) 

[5, 6, 7, 8, 9] 

Code that does something once for each element of a sequence sometimes loops 
over range ( 1 en (SequenceVari abi e )). This range contains the index of each ele¬ 
ment in the sequence. For example, this code prints the days of the week: 

Days0fWeek=["Monday", "Tuesday", "Wednesday", "Thursday", 
"Friday", "Saturday", "Sunday"] 
for X in range(len(Days0fWeek)): 

print "Day",X,"is",Days0fWeek[X] 
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An xrange is an object that represents a range of numbers. You can loop over an 
xrange instead of the list returned by range. Tbe only real difference is tbat creat- 
ing a large range involves creating a memory-hogging list, whlle creating an xrange 
of any size is cheap. Try cbecking your system’s free memory wbile running these 
interpreter commands: 

>>> MetTioryHog = range(1000000) # There goes 1 ots of RAM! 

>>> BigXRange=xrange( 1000000) # Only uses a 1 ittle memory. 

To see tbe contents of an xrange in convenient list form, use tbe tol i st method: 

>>> Smal1XRange=xrange(10,110,10) 

>>> Smal1XRange.tolist() 

[10, 20, 30, 40, 50, 60, 70, 80, 90, 100] 

Breaking, continuing, and eise-clauses 

Python’s conti nue statement jumps to tbe next iteration of a loop. The break 
statement jumps out of a loop entirely. These statements apply only to the inner- 
most loop; if you are in a loop-within-a-loop-within-a-loop, break jumps out of only 
the innermost loop. 

You can follow the body of a for-loop with an else-clause. The code in the else-clause 
executes after the loop finishes iterating, unless the program exits the loop due to a 
break statement. (If you have no break statement in the loop, the else-clause 
always executes, so you really have no need to put the code in an else-clause.) 

Listing 5-4 illustrat es break, continue, and an else-clause: 


Listing 5-4: ClosestPointpy 


import math 

def Fi ndCl osestPointAboveXAxis(PointList,TargetPoint): 

. Given a list of points and a target point, this function 

returns the lisfs closest point, and its distance from the 
target. It ignores all points with a negative y-coordinate. We 
represent points in the plane (or on screen) as a two-valued 

tuple of the form (x-coordinate,y-coordinate). . 

C1osestPoint=None # Initialize. 

ClosestDistance=None 

# Iterate over each point in the list. 
for Point in PointList: 

# Throw out any point below the X axis. 

if (Point[l]<0): 

# Skip to the next point in the list. 

conti nue 

# Compute the distance from this point to the target. 


Continued 
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Listing 5-4 (continued) 


# The following two lines are one statement; 

# indentation for clarity is optional. 

Di stToPoi nt=rriath .sqrt((TargetPoint[0]-Point[0] )**2 + 

(TargetPoint[l]-Point[l])**2) 
if (C1osestDistance == None or 

DistToPoint < C1osestDistance): 

C1osestPoint=Point 
C1osestDistance = DistanceToPoint 
if (DiStanceToPoint==0): 
print "Point found in list" 

# Exit the loop entirely, since no point wi11 

# be closer than this 

break 
el se: 

# This clause executes unless we hit the break above. 

print "Point not found in list" 
return (C1osestPoint, C1osestDistance) 


Here is the function in action: 

»> SorriePoints=[(-l,-l), (4,5), (-5,7), (23,-2), (5,2)] 

>>> C1 osestPo i nt.FindCl osestPointAboveXAxis(SomePoints , (1,1)) 
Point not found in list 
((5, 2), 4.1231056256176606) 

>>> C1 osestPo i nt.Fi ndCl osestPointAboveXAxis(SotnePoints,(-1,-1)) 
Point not found in list 
((5, 2), 6.7082039324993694) 

>>> C1 osestPo i nt.FindCl osestPointAboveXAxis(SomePoints,(4,5)) 
Point found in list 
((4, 5), 0.0) 


Changing horses in midstream 

Modifying the sequence that you are in the process of looping over is not recom- 
mended —Python won’t get confused, but any mere mortals reading your program 
will. 

The loop variable keeps iterating over its reference sequence, even if you change a 
sequence variable. For example, this loop prints the numbers from 0 to 99; chang¬ 
ing the value that MyRange points to does not affect control flow: 

MyRange=range(100) 
for X in MyRange: 
print X 

MyRange = range(30) # No change in looping behavior! 
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However, changing the reference sequence does affect the loop. After executing for 
the nth element in a sequence, the loop proceeds to the (n+l)th element, even if the 
sequence changes in the process. For example, this loop prints even numbers from 
0 to 98: 

MyRange=range(100) 
for X in MyRange: 
print X 

dei MyRangeCO] # Changing the 1oop-sequence in place 

Modifying the loop variable Inside a for-loop is also inadvisable. It does not change 
looping behavlor; Python will continue the next iteration of the loop as usual. 


Using While-Loops 

If you could crossbreed an if-statement and a for-loop, you would get a while- 
statement, Python’s other looping construet. 

A while-statement has the form: 

while (<expression>): 

<block of code> 

When Python encounters a while-statement, it evaluates the expression, and if the 
expression is true, it exeeutes the corresponding block of code. Python keeps exe¬ 
cuting the block of code until the expression is no longer true. For example, this 
code counts down from 10 to 1: 

X=10 

while (X>0): 
print X 
X -= 1 

Within a while-loop, you can use the conti nue statement to jump to the next itera¬ 
tion, or the brea k statement to jump out of the loop entirely. A while-loop can also 
have an else-block. Code in the else-block exeeutes immediately after the last itera¬ 
tion, unless a brea k statement exits the loop. These statements work simllarly for 
for-loops and whlle-loops. See the section on for-loops, above, for examples of 
break, conti nue, and else. 


Throwing and Catehing Exceptions 

Imagine a Python program innocently going about its business, when suddenly ... 
[dramatic, scary music] something goes wrong. 
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In general, when a function or method encounters a situation that it can’t cope 
with, it raises an exception. An exception is a Python object that represents an 
error. 

Passing the buck: propagating exceptions 

When a function raises an exception, the function must either handle the exception 
immediately or terminate. If the function doesn’t handle the exception, the caller 
may handle it. If not, the caller also terminates immediately as well. The exception 
propagates up the call-stack until someone handles the error. If nohody catches the 
exception, the whole program terminates. 

In general, functions that return a value should return None to Indicate a “reason- 
able” failure, and only raise an exception for “unreasonable” problems. Just what is 
reasonable is open to debate, so it is generally a good idea to clearly document the 
exceptions your code raises, and to handle common exceptions raised by the code 
you call. 

Handiing an exception 

If you have some “suspicious” code that may raise an exception, you can defend 
your program by placing the suspicious code in a t ry : block. After the t ry : block, 
include an ex ce pt statement, followed by a block of code which handles the prob- 
lem (as elegantly as possible). 

For example, the guess-the-number program from earlier in this chapter crashes if 
you try to feed it something other than an integer. The error looks something like 
this: 

Traceback (most recent call last): 

File "C:\Python20\NurriberGuess.py", line 7, in ? 

NuniberGuess = i nt (sys . stdi n . readl i ne()) 

ValueError: invalid literal for int(): whoops! 

Listing 5-5 shows a new-and-improved script that handles the exception. The call to 

sys .stdin . readl ine( ) is nowin a try : block: 


Listing 5-5: NumberGuess2.py 


import randotn 
import sys 

# This line chooses a random integer >=1 and <=100. 

# (See Chapter 15 for a proper explanation.) 

SecretNutnber=randotn. randi nt (1,100) 
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print "I'ni thinking of a nutnber between 1 and 100." 

# Loop forever (at least until the user hits CtrlBreak). 
whi1 e (1): 

print "Guess my number." 

# The following line reads a line of input from 

# the command line and converts it to an integer. 

try: 

NutnberGuess = i nt (sys . stdi n . readl i ne()) 
except ValueError: 

print "Please type a whole number." 
conti nue 

if (NumberGuess==SecretNumber): 

print "Correct! Choosing a new number..." 
SecretNumber=random.randint(l,100) 
elif (NumberGuess > SecretNumber): 

print "Lower." 
el se: 

print "Higher." 


More on exceptions 

An exception can have an argument, which is a value that gives additional informa- 
tion about the problem. The contents (and even the type) of the argument vary by 
exception. You capture an exception’s argument by supplying a variable in the 
except clause: except Excepti onType, Argument Vari abi e 

You can supply several except clauses to handle various types of exceptions. In this 
case, exceptions are handled by the first applicable except clause. You can also 
provide a generic except clause, which handles any exception. If you do this, I 
highly recommend that you do something with the exception. Code that silently 
“swallows” exceptions may mask important bugs, like a NameError. Here is some 
cookie-cutter code I use for quick-and-dirty error handling: 

try: 

DoDangerousStuff() 
except: 

# The Show must go on! 

# Print the exception and the stack trace, and continue. 

(ErrorType.ErrorValue,ErrorTB)=sys.exc_i nfo() 
print sys.exc_info() 
traceback.print_exc(ErrorTB) 

After the except clause(s), you can include an else-clause. The code in the else-block 
executes if the code in the t ry : block does not raise an exception. The else-block is a 
good place for code that does not need the t ry : block’s protection. 

Python raises an 10Error exception if you try to open a file that doesnT exist. Here 
is a snippet of code that handles a missing file without crashing. (This code grabs 
the exception argument — a tuple consisting of an error number and error string — 
but doesnT do anything interesting with it.) 
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try: 

Options Fi 1e=open( "SecretOptions.txt") 

except lOError, (ErrorNumber,ErrorString): 

# Assume our default option values are all OK. 

# We need a statement here, but we have nothing 
to do, so we pass. 

pass 
el se: 

# This executes if we opened it without an lOError. 

ParseOptionsFile(OptionsFile) 

Defining and raising exceptions 

You can raise exceptions with the statement rai se excepti onType , argument. 
ExceptionType is the type of exception (for example, NameError). Argument is a 
value for the exception argument. Argument is optional; if not supplied, the excep¬ 
tion argument is None. 

An exception can be a string, a class, or an object. Most of the exceptions that the 
Python core raises are classes, with an argument that is an instance of the class. 
Defining new exceptions is quite easy, as this contrived example demonstrates: 

def Cal cui ateEl fFli tPoi nts ( Level ): 
if Level<1: 

raise "Invalid elf 1evel!",Level 

yy (The code bel ow won't execute if we raise 
yy the exception.) 

Fli tPoi nts=0 

for DieRoll in range(Level): 

FlitPoints += random. randi nt(1,6) 

Note In order to catch an exception, an "except" clause must refer to the same excep- 

tion thrown. Python compares string exceptions by reference identity (i s, not =). 
So, if you have code to raise "BigProblem" and an except-clause for "BigProblem," 
the except clause may not catch the exception. (The strings are equivalent, but 
may not point to the same spot in memory.) To handie exceptions properly, use a 
named constant string, or a class. (See Listing 5-6 for an example.) 

Cleaning up with finally 

An alternative mechanism for coping with failure is the finally block. The 
finally block is a place to put any code that must execute, whether the try-block 
raised an exception or not. You can provide except clause(s), or a f i nal 1 y clause, 
but not both. 

For example, multithreaded programs often use a lock to prevent threads from 
stomping on each other’s data. If a thread acquires a lock and crashes without 
releasing it, the other threads may be kept waiting forever — an unpleasant situa- 
tion called deadlock. This example is a perfect job for the finally clause: 
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try: 

DataLock.aequi re() 

# ... do things with the data ... 
fi na 1ly: 

# This code *must* exeeute. The fate of the 
free world hangs in the balance! 

DataLock.release() 


Debugging with Assertions 

An assertion is a sanity-check that you can turn on (for mciximum paranoia) or turn 
off (to speed things up). Using an assertion can help make code self-documenting; 
raising an Asserti onError implies that a prohlem is due to programmer error and 
not normal prohlems. Programmers often place assertions at the start of a functlon 
to check for valid input, and after a function call to check for valid output. 


Assertions in Python 

You can add assertions to your code with the syntax assert <Expression>. When 
it encounters an assert statement, Python evaluates the accompanying expres- 
sion, which is hopefully true. If the expression is false, Python raises an 

AssertionError. 

You can include an assertion argument, via the syntax assert 
Expressi on , ArgumentExpressi on. If the assertion fails, Python uses 
ArgumentExpression as the argument for the Asserti onError. 

For example, here is a function that converts a temperature from degrees Kelvin to 
degrees Fahrenheit. Since zero degrees Kelvin is as cold as it gets, the function hails 
out if it sees a negative temperature: 

>>> def Kel vi nToFahrenhei t(Terriperature): 

assert (Temperature >= 0),"Colder than absolute zero!" 
return ((Temperature-273)*1.8)+32 
>>> KelvinToFahrenheit(273) 

32.0 

>>> int(KelvinToFahrenheit(505.78)) 

451 

>>> KelvinToFahrenheit(-5) 

Traceback (innermost last): 

File "Kpyshel1#186>", line 1, in ? 

KelvinToFahrenheit( - 5) 

File "Kpyshel 1#178>", line 2, in Kel vi nToFahrenhei t 
assert (Temperature >= 0),"Colder than absolute zero!" 

AssertionError: Colder than absolute zero! 
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Toggling assertions 

Normally, assertions are active. They are toggled by the internal variable_debug_. 

Turning on optimization (by running python with the -0 command-line argument) 

turns assertions off. (Direct access to_debug_is also possible, but not 

recommended.) 

Tip In assert statements, avoid using expressions with side effects. If the assertion 

expression affects the data, then the "release" and "debug" versions of your Scripts 
may behave differently, leaving you with twice as much debugging to do. 


Example: Came of Life 

Listing 5-6 simulates John Conway’s Game of Life, a simple, cellular automata. The 
game is played on a grid. Each cell of the grid can be “alive” or “dead.” Each “gener- 
ation,” cells live or die based on the state of their elght neighborlng cells. Cells with 
three living neighbors come to life. Live cells with two living neighbors stay alive. 
Ali other cells die (or stay dead). 

Cross- A This example introduces a class to represent the playing field. For further informa- 
Referen^ tion on classes, see Chapter 7. 


Listing 5-6: LifeGame.py 


# We arbitrarily set the field size to 10x10. Naming the size 

# in uppercase implies that we shouldn't change its value. 

FIELD_SIZE=10 

# Create two strings for use as exceptions. We raise and catch 

# these vari abi es, instead of raw strings (which would be ==- 

# equivalent, but possibly not i sequivalent). 

STEADY_STATE="Steady state" 

EVERYONE_DEAD="Everyone dead" 

class PlayField: 

# Constructor. When creating a PlayField, initialize the 

# grid to be al1 dead: 

def _i ni t_(sel f): 

self.LifeGrid={) 

for Y in range(FIELD_SIZE): 

for X in range(FIELD_SIZE): 
self.LifeGrid[(X,Y)]=0 
def SetAlive(self,X,Y): 

self.LifeGrid[(X,Y)]=l 
def SetDead(self,X,Y): 

self.LifeGrid[(X,Y)]=0 
def PrintGrid(self,Number): 
print "Generation",Number 
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for Y in range(FIELD_SIZE): 

for X in range(FIELD_SIZE): 

# Trailing comma means don't print newline: 
print self.LifeGrid[(X, Y)], 

# Print newline at end of row: 

pri nt 

def GetLiveNeighbors(self,X,Y): 

# The playing field is a "donut world". where the 

# edge cells join to the opposite edge. 

LeftColumn=X-1 

if (LeftColumn<0): LeftColumn=FIELD_SIZE-l 
RightColumn=(X+l) % FIELD_SIZE 
UpRow=Y-l 

if (UpRowfO): UpRow=FIELD_SIZE-l 
DownRow=(Y+l) % FIELD_SIZE 

LiveCount=(self.LifeGrid[(LeftColumn,UpRow)]+ 
self.LifeGrid[(X,UpRow)]+ 
self.LifeGrid[(RightColumn,UpRow)]+ 
self.LifeGrid[(LeftColumn,Y)]+ 
self.LifeGrid[(RightColumn,Y)]+ 
self.LifeGrid[(LeftColumn,DownRow)]+ 
self.LifeGrid[(X,DownRow)]+ 
self.LifeGrid[(RightColumn,DownRow)]) 
return (LiveCount) 
def RunGeneration(sel f): 

NewGrid={j 
AI 1DeadFlag=l 

for Y in range(FIELD_SIZE): 

for X in range(FIELD_SIZE): 

CurrentState=self.LifeGrid[(X,Y)] 
LiveCount=self.GetLiveNeighbors(X,Y) 
if ((LiveCount==2 and CurrentState) 
or (LiveCount==3)): 

NewGridC(X,Y)]=1 
AI 1DeadFlag=0 
el se: 

NewGridC(X,Y)]=0 

if (AI 1DeadFlag): raise EVERYONE_DEAD 
if self.LifeGrid==NewGrid: raise STEADY_STATE 
self.LifeGrid,01dGrid=NewGrid,sel f.Li feGri d 
def ShowManyGenerations(self,GenerationCount): 
try: 

for Cycle in range(GenerationCount): 
self.PrintGrid(Cycle) 
self.RunGeneration() 
except EVERY0NE_DEAD: 

print "The population is now dead." 
except STEADY_STATE: 

print "The population is no longer changing." 

if (_name_=="_main_"): 

# This fi rst grid quickly settles into a pattern 

# that does not change. 


Continued 
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Listing 5-6 (continued) 


Bon' ngGri d=Pl ay Fi el d() 

BoringGrid.SetAlive(2,2) 

BoringGrid.SetAlive(2,3) 

BoringGrid.SetAlive(2,4) 

BoringGrid.SetAlive(3,2) 

BoringGrid.ShowManyGenerations(50) 

# This grid contains a "glider" - a pattern of live 

# cells which moves diagonally across the grid. 

G1iderGrid=PlayField() 

G1iderGrid.SetAli ve(0,0) 

G1iderGrid.SetAli ve(1,0) 

G1 iderGrid.SetAli ve(2,0) 

G1iderGrid.SetAli ve(2,1) 

G1iderGrid.SetAli ve(1,2) 
GliderGrid.ShowManyGenerations(50) 


Summary 

Python has several tools for controlling the flow of execution. In this chapter you: 

-f Made decisions with if-statements. 

-f Set up repeating tasks with for-loops and while-loops. 

-f Built code that copes with problems by handling exceptions. 

Learned to add test scaffolding with assertions. 

In the next chapter you’II learn how to organize all your Python code into functions, 
modules, and packages. 

■f -f 
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P ython lets you break code down into reusable functions 
and classes, then reassemble those components into 
modules and packages. The larger the project, the more useful 
this organization becomes. 

This chapter explains function definition syntax, module and 
package structure, and Python’s rules for visibility and scope. 


Defining Functions 

Here is a sample function definition: 

def ReverseString(Forwards): 

.Convert a string to a 1 i st of 

characters, reverse the 

list, and join the list back into a string 

CharacterList=list(Forwards) 

CharacterList.reversef) 

return string.joinfCharacterList,""); 

The statement def Functi onNatnef [paratneters ,...]) 
begins the function. Calling the function executes the code 
within the following indented block. 

A string following the def statement is a docstring. A docstring 
is a comment intended as documentation. Development envi- 
ronments like IDLE display a function’s docstrings to show 
how to call the function. Also, tools like HappyDoc can extract 
docstrings from code to produce documentation. So, a doc¬ 
string is a good place to describe a function’s behavior, 
parameter requirements, and the like. Modules can also have 
a docstring — a string preceding any executable code is taken 
to be the module’s description. 



> ♦ ♦ ♦ 
In This Chapter 

Defining functions 

Grouping code with 
modules 

Importing modules 

Locoting modules 
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The statement return [expressi on] exits a function, optionally passing backan 
expression to the caller. A return statement with no arguments is the same as 
return None. A function also exits (returning None) when the last statement fln- 
ishes, and execution “runs off the end of” the function code block. 


Pass by object reference 

A Python variable is a reference to an object. Python passes function parameters 
using call-by-value. If you change what a parameter refers to within a function, the 
change does not affect the function’s caller. For example: 

>>> def StupidFunctionCInputList): 

InputList=["I","Like","Cheese"] 

>>> MyList=[l,2,3] 

>>> StupidFunction(MyList) 

>>> print MyList # MyList is unchanged! 

[1, 2, 3] 

The parameter InputList is local to the function StupidFunction. Changing InputList 
within the function does not affect MyList. The function accomplishes nothing. 

However, a function can change the object that a parameter refers to. For example, 
this function removes duplicate elements from a list: 

def RemoveDuplicates(InputList): 

Listlndex=-1 

# We iterate over the list from right to left, deleting 

# all duplicates of element 1, then -2, and so on. (Because 

# we are removing elements of the list, using negative 

# indices is convenient: element -3 is stili element -3 
after we delete some items preceding it.) 

while (-ListIndex<len(InputList)): 

# list.indexO returns a positive index, so get the 

# positive equi valent of Listindex and name it 

# Currentindex (same element, new index number). 

CurrentIndex=len(InputList)+Listlndex 
CurrentElement=InputList[ListIndex] 

# Keep removing duplicate elements as long as 

# an element precedes the current one. 

while (InputList.index(CurrentElement)<CurrentIndex): 
InputList.remove(CurrentElement) 
CurrentIndex=CurrentIndex-l 
ListIndex=Listlndex-1 


All about parameters 

A function parameter can have a default value. If a parameter has a default value, 
you do not need to supply a value to call the function. 
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When you call a function, you can supply its parameters by name. It is legal to name 
some parameters and not others — but after supplying the name for one parameter, 
you must name any other parameters you pass. 

For example, this function simulates the rolling of dice. By default, it rolls ordinary 
6-sided dice, one at a time: 

>>> import whrandom 

>>> def Rol1 Dice(Dice = l,Sides=6): 

Total=0 

for Die in range(Dice): 

Total += whrandom.randint(1,Si des) 
return Total 

>>> Rol1 Dice() 

5 

>>> RollDice(2) # Come on, snake-eyes! 

8 

>>> Rol1 Dice(2,4) # Rol1 two four-sided dice. 

5 

>>> Rol1 Dice(Sides=20) # Named parameter 
17 

>>> # After naming one parameter, you must name the rest: 

>>> Rol1 Dice(Sides=5,4) 

SyntaxError: non-keyword arg after keyword arg 

A function evaluates its argument defaults only once. We recommend avoiding 
dynamic (or mutable) default values. For example, if you do not pass a value to this 
function, it will always print the time that you first called it: 

def PrintTi me(TimeStamp=time.ti me()): 

# time.timeO is the current time in mi 11 iseconds, 
time. 1 ocal timeC) puts the time into the 

# canonical tupleform, and time.asctimeC) converts 

# the timetuple to a cute string format. 

# The function's default argument, TimeStamp, does 

# not change between calls! 

print time.asctime(time.1ocaltime(TimeStamp)) 

This improved version of the function prints the current time if another time is not 
provided: 

def PrintTime(TimeStamp=None): 

if (TimeStamp==None): TimeStamp=time.time() 
print time.asctime(time.1ocaltime(TimeStamp)) 


Arbitrary arguments 

A function can accept an arbitrary sequence of parameters. The function collects 
these parameters into one tuple. This logging function shows the internal object IDs 
of a sequence of arguments: 
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def LogObjectlDsdogString, *args): 
print LogString 

for arg in args: print id(arg) 

A function can also accept an arbitrary collection of named parameters. The func- 
tion collects these named parameters into one dictionary. This version of the log- 
ging function lets you give names to the objects passed in: 

def LogObjectlDsdogString, **kwargs): 
print LogString 

for ( ParamNatne, ParatnVal ue) in kwargs . items(): 

pri nt "Object:", ParatnName ID:i d( ParatnVal ue) 

To make a truly omnivorous function, you can take a dictionary of arbitrary named 
parameters and a tuple of unnamed parameters. 

Apply: passing arguments from a tuple 

The function apply (InvokeFunction,ArgumentSequence) calls the function 
InvokeFunction, passing the elements ot ArgumentSequence as arguments. The use- 
fulness of app 1 y is that it breaks arguments out of a tuple cleanly, for any length of 
tuple. 

For example, assume you have a function SetColor(Red,Green,Blue), and a tuple 
representing a color: 

>>> print MyColor 
(255, 0, 255) 

>>> SetlLol or (MyCol or[0], MyCol or [1], MyCol or[2]) # Kludgy! 

>>> apply(SetColor,MyColor) # Same as above, but cleaner. 

A bit of functional programming 

Python can define new functions on the fly, giving you some of the functional flexi- 
bllity of languages like Lisp and Scheme. 

You define an anonymous function with the lambda keyword. The syntax is 1 ambda 
[parameters ,...]: <expressi on>. For example, here is an anonymous function 
that filters list entries: 

>>> SomeNumbers=[5,10,15,3,18,2] 

>>> fi 1ter(1ambda x:x>10, SomeNumbers) 

[15, 18] 

This code uses anonymous functions to test for primes: 
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def Fi ndPri tnes ( EndNutnber): 

NumList = range (2 , EndNutnber) 

Index=0 

whi 1 e (Index<len(NumList)): 

NumList=fi 1ter(1ambda y,x=NumList[Index]: 

(y<=x or y%x!=0), NumList) 

Index += 1 
print NumList 

Lambda functions can be helpful for event handling in programs with a GUI. For 
example, here is some code to add a button to a Tkinter frame. 

def AddCosmeticButton(ButtonFrame,ButtonLabel ): 

Button(ButtonFrame,text=ButtonLabel,command = lambda 
= ButtonLabel :LogUnimplemented(1)).pack() 


Clicking the button causes it to call LogUnimplemented with the button label as an 
argument. Presumably, LogUnimplemented makes note of the fact that somebody is 
clicking a button that does nothing. 



An anonymous function cannot be a direct call to pri nt because 1 ambda 
requires an expression. 


Lambda functions have their own local namespace and cannot access variables 
other than those in their parameter list and those in the global namespace. 


Grouping Code with Modules 

A module is a file consisting of Python code. A module can define functions, classes, 
and variables. A module can also include runnable code. 

A stand-alone module is often called a script or program. You can use whichever 
word you like, because Python makes no distinctlon between them. 

Grouping related code into a module makes the code easier to understand and use. 
When writing a program, split off code into separate modules whenever a file starts 
becoming too large or performing too many different functions. 

Laying out a module 

The usual order for module elements is: 

Docstring and/or general comments (revision log or Copyright Information, 
and so on) 

-f Import statements (see below for more Information on importing modules) 
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-f Definitions of module-level variables (“constants”) 

-f Definitions of classes and functions 
-f Main functlon, if any 

This organization is not required, but it works well and is widely used. 

Note People often store frequently used values in ALL_CAPS_VARIABLES to make later 

code easier to maintain, or simply more readable. For example, the Standard 
library ftplib includes this definition: 

FTP_P0RT = 21 # The Standard FTP server control port 

Such a variable is "constant by convention" —Python does not forbid modifica- 
tions, but callers shouid not change its value. 

Taking inventory of a module 

The function di r (tnodul e) returns a list of the variables, functions, and classes 
defined in module. With no arguments, di r returns a list of all currently defined 
names. di r (_bui 1 ti n_) returns a list of all bullt-in names. For example: 

>>> dir() # Just after starting Python 

['_builtins_’_doc_'_name_'] 

>>> import sys 
>>> dir() 

['_builtins_’_doc_'_name_'sys'] 

You can pass any object (or class) to di r to get a list of class members. 


Importing Modules 

To use a module, you must first import it. Then, you can access the names in the 
module using dotted notation. For example: 

>>> string.digits # Invalid, because I haven't imported string 
Traceback (most recent call last): 

File ''<stdin>'', line 1, in ? 

NameError: There is no variable named 'string' 

>>> import string # Note: No parentheses around module name. 

>>> string.digits 
'0123456789' 

Another option is to Import names from the module into the current namespace, 
using the syntax from Modul eName i mport Name , Name2. For example: 

>>> from string import digits 
>>> digits # Without a dot 
'0123456789' 
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>>> string.digits # I don't know about the module, only digits . 

Traceback (most recent call last): 

File "<stdin>", line 1, in ? 

NameError: There is no variable named 'string' 

To bring every name from a module into the current namespace, use a blanket 
import: frommodule import *. Importing modules this way can make for confusing 
code, especially if two modules have functions with the same name. But it can also 
save a lot of typing. 

The import statements for a script should appear at the beginning of the file. (This 
arrangement is not required, but importing halfway though a script is confusing.) 


What eise happens upon import? 

Within a module, the special string variable_ name _is the name of the module. 

When you execute a stand-alone module, its_ name _is always_ mai n _. This 

provides a handy way to set aside code that runs when you invoke a module, but not 
when you import it. Some modules use this code as a test driver. (See Listing 6-1.) 


Listing 6-1: Alpha.py 


import string 

def AIphabetize(Str): 

"Alphabetize the letters in a string" 

Chartist=li st(Str) 

Charti st.sort() 

return (string.joinCChartist,"")) 
if (_name_=="_main_"): 

# This code runs when we execute the script, not when 

# we import it. 

X=string.upper("BritneySpears") 

Y=string.upper("Presbyter ians") 

# Strange but true! 

print (Alphabetize(X)==A1phabetize(Y)) 
el se: 

# This code runs when we import (not run) the module. 

print "Imported module Alpha" 


Reimporting modules 

Once Python has imported a module once, it doesnT import it again for subsequent 
i mport statements. You can force Python to “reimport” a module with a call to 
reload(LoadedModule).This procedure is useful for debugging — you can edit a 
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module on disk, then rei oad it without having to restart an Interactive interpreter 
session. 

Exotic imports 

A module can override Standard import behavior by implementing the function 

_i mport_(nameC, globals[, locals[, fromlist]]]). Because a module is a 

class, defining_ i mport _in a module amounts to overriding the default version 

of i mport . 

Caution We don't recommend overriding_ import _as it is a very low-level operation 

for such a high-level language! See the libraries imp, ihooks, and rexec for exam- 
ples of overridden import behavior. 


Locating Modules 

When you import a module, the Python interpreter searches for the module in the 
current directory. If the module isn’t found, Python then searches each directory in 
the PythonPath. If all else fails, Python checks the default path. On Windows, the 
default path consists of c : \ python20\l i b \ and some subdirectories; on UNIX, this 
default path is normally /usr/local/lib/python/. (The code for Python’s Stan¬ 
dard libraries is installed into the default path. Some modules, such as sys, are 
built into the Python interpreter, and have no corresponding .py files.) 

Python Stores a list of directories that it searches for modules in the variable 

sys.path. 

Python path 

The PythonPath is an environment variable, consisting of a list of directories. Here 
is a typical PythonPath from a Windows System: 

set PYTHONPATH=c: \python20\l ib;c: \python20\l ib\proj1;c: \python20\l ib\bob 

And here is a typical PythonPath from a UNIX system: 

set PYTHONPATH=/hortie/stanner/python : /usr/bi n/python/1 i b 

1 generally use a scratch folder to hold modules 1 am worklng on; other files 1 put in 
the lib directory (or, if they are part of a package, in subdirectories). 1 find that set- 
ting the PythonPath explicitly is most useful for switching between different ver- 
sions of a module. 
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Compiled files 

You can compile a Python program into system-independent bytecodes. The inter¬ 
preter Stores the compiled version of a module in a corresponding file with a . pyc 
extension. This precompiled file runs at the same speed, hut loads faster hecause 
Python need not parse the source code. Files compiled with the optlmizatlon flag 
on are named with a . pyo extension, and behave like .pyc files. 

When you import a module foo, Python looks for a compiled version of foo. Python 
looks for a file named foo. pyc that is as new as f oo. py. If so, Python loads foo . 
pyc instead of re-parsing foo. py. If not, Python parses foo . py, and writes out the 
compiled version to foo. pyc. 

Note when you run a script from the command line, Python does not create (or look 
for) a precompiled version. To save some parsing time, you can invoke a short 
"stub" script that imports the main module. Or, you can compile the main script by 
hand (by importing it, by calling py_corripi le.corripile(ScriptFileNarrie),or 
by calling compi 1 eal 1 . compi 1 e_di r( Seri ptDi rectoryNatne )), then invoke 
the . pyc file directiy. However, be sure to precompile the script again when you 
change it! 


Understanding Scope Rules 

Variables are names (identifiers) that map to objects. A namespace is a dictionary 
of varlable names (keys) and their corresponding objects (values). A Python state- 
ment can access variables in a local namespace and in the global namespace. If 
(heaven forfend!) a local and a global variable have the same name, the local varl¬ 
able shadows the global variable. 

Each function has its own local namespace. Class methods follow the same scoping 
rule as ordinary functions. Python accesses object attributes via the sel f argu- 
ment; attributes are not brought separately into the namespace. 

At the module level, or in an Interactive session, the local namespace is the same as 
the global namespace. For purposes of an eval, exee, exeef i 1 e, or i nput state- 
ment, the local namespace is the same as the caller’s. 

Is it local or global? 

Python makes educated guesses on whether variables are local or global. It 
assumes that any variable assigned a value in a function is local. Therefore, in order 
to assign a value to a global variable within a function, you must first use the global 
statement. The statement global VarNatne telis Python that VarName is a global 
variable. Python stops searching the local namespace for the variable. 
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For example, Listing 6-2 defines a variable NumberOfMonkeys in the global name- 
space. Within the function AddMonkey, we assign NumberOfMonkeys a value — 
therefore, Python assumes NumberOfMonkeys is a local variable. However, we 
access the value of the local variable NumberOfMonkeys before setting it, so an 
UnboundLocalError is the resuit. Uncommenting the global statement flxes the 
problem. 


Listing 6-2: Monkeys.py 


NumberOfMonkeys = 11 
def AddMonkey(): 

# Uncomment the following line to fix the code: 

#global NumberOfMonkeys 
NumberOfMonkeys = NumberOfMonkeys + 1 

print NumberOfMonkeys 

AddMonkey() 

print NumberOfMonkeys 


Listing namespace contents 

The built-in functions 1 ocal s and gl obal s return local and global namespace con¬ 
tents in dictionary form. These operations are handy for debugging. 


Grouping Modules into Packages 

You can group related modules into a package. Packages can also contain subpack- 
ages, and sub-subpackages, and so on. You access modules inside a package uslng 
dotted notation — for example, seti .log.FlushLogFilef) calls the function 
FlushLogFile in the module 1 og in the package seti. 

Python locates packages by looking for a directory containing a file named 

_i ni t_. py. The directory can be a subdirectory of any directory insys.path. 

The directory name is the package name. 

The script_ i n i t_. py runs when the package is imported. It can be an empty 

file, but should probably at least contain a docstrlng. It may also define the special 

variable_ a 11_, whlch governs the behavior of a blanket import of the form f r o m 

PackageName i mport *. If defined,_all_is a list of names of modules to bring into 
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the current namespace. If the script_ i n i t _. py does not define_ a 11_, then a 

blanket-import brings into tbe current namespace only the names defined and 
modules imported in _i ni t_. py. 

j- Cross- ^ See Chapter 36 for Information on how to install new modules and packages, and 
Referenc^ how to distribute your own code. 


Compiling and Running Programmatically 

The exec statement can run an arbitrary chunk of Python code. The syntax is exec 
ExecuteObject [inGlobalDictC, LocalDi ct ] ]. ixecufeOb/ecf is a string, file 
object, or code object containing Python code. GlobalDict and LocalDict are diction- 
aries used for the global and local namespaces, respectively. Both GlobalDict and 
LocalDict are optional. If you omit LocalDict, it defaults to GlobalDict. If you omit 
both, the code runs using the current namespaces. 

The eval function evaluates a Python expression. The syntax is eval 
(Expressi onObject[, G1 obal Di ct[, Local Di ct] ]). ExpressionObject is a string 
or a code object; GlobalDict and LocalDict have the same semantics as for exec. 

The execf i 1 e function has the same syntax as exec, except that it takes a file 
name instead of an execute object. 

These functions ralse an exception if they encounter a syntax error. 

The compile function transforms a code string into a runnable code object. Python 
passes the code object to exec or eval . The syntax is 

corripile(CodeString,FileNarrie,Kind). CodeString is a string of Python code. 
FileName is a string describing the code’s origin; if Python read the code from a file, 
FileName should be the name of that file. Kind is a string describing the code: 

4- “exec” — one or more executable statements 
“eval” — a single expression 

>• “single” — a single statement, which is printed upon evaluation if not None 

/Note Multiline expressions should have two trailing newlines in order for Python to pass 

' them to compile or exec. (This requirement is a quirk of Python that may be 

fixed in a later version.) 
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Summary 

Program organization helps make code reusable, as well as more easily compre- 
hended. In this chapter you: 

-f Defined functions with variable argument lists. 

Organized code into modules and packages. 

Compiled and ran Python code on-tbe-fly. 

In the next chapter you’ll harness the power of object-oriented programming in 
Python. 


Object-Oriented 

Python 


C H A P Jf E R 




P ython has been an object-oriented language from day 
one. Because of this, creating and using classes and 
objects are downright easy. This chapter helps you become an 
expert in using Python’s object-oriented programming support. 

OverView of Object-Oriented 
Python 

If you don’t have any previous experience with object-oriented 
(00) programming, you may want to consuit an introductory 
course on it or at least a tutorial of some sort so that you have 
a grasp of the basic concepts. 

Python’s object-oriented programming support is very 
straightforward and easy: you create classes (wbich are some- 
thing akin to blueprints), and you use them to create instance 
objects (which are like the usable and finished versions of 
what the blueprints represent). 

An instance object (or just “object,” for short) can have any 
number of attributes, which include data members (variables 
belonging to that object) and methods (functions belonging to 
that object that operate on that objecfs data). 

You can create a new class by deriving it from one or more 
other classes. The new child class, or subclass, inherits the 
attributes of its parent classes, but it may override any of 
the parent’s attributes as well as add additional attributes 
of its own. 


> ♦ ♦ ♦ 

In This Chapter 

OverView of object- 
oriented Python 

Creating classes and 
instance objects 

Deriving new classes 
from other classes 

Hiding private data 

Identifying class 
membership 

Overloading 
Standard behaviors 

Using weak 
references 

♦ ♦ ♦ ♦ 
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Creating Classes and Instance Objects 

Below is a sample class and an example of its use: 

>>> class Wallet: 

"Where does tny tnoney go?" 
walletCnt = 0 

def _init_(self,bal ance=0): 

self.balance = balance 
Wal1 et.wal1etCnt += 1 

def getPaid(self,amnt): 
self.balance += amnt 
self.displayBalance() 

def spend(self,amnt): 
self.balance -= amnt 
self.displayBalance() 

def displayBalance(self): 

print 'New balance: $%.2f' % self.balance 

The class statement creates a new class definition (which is itself also an object) 
called Wallet. The class has a documentation string (which you can access via 
Wall et._doc_^), a count of all the wallets in existence, and three methods. 

You declare methods like normal functions with the exception that the first argu- 
ment to each method is sel f, the conventional Python name for the instance of the 
object (it has the same role as the thi s object in Java or the thi s pointer in C++). 
Python adds the sel f argument to the list for you; you don’t need to include it 
when you call the methods. The first method is a special constructor or initializa- 
tion method that Python calls when you create a new instance of this class. Note 
that it accepts an initial balance as an optional parameter. The other two methods 
operate on the wallefs current balance. 

/Note All methods must operate on an instance of the object (if you're coming from 
' " C++, there are no "static methods"). 

Objects can have two types of data members: wal 1 etCnt, which is outside of any 
method of the class, is a class variable, which means that all instances of the class 
share it. Changing its value in one instance (or in the class definition itself) changes 
it everywhere, so any wallet can use walletCnt to see how many wallets youVe 
created: 

>>> myWallet = Wallet(); yourWallet = Wallet() 

>>> print myWal1 et.wal1etCnt, yourWal1 et.wal1etCnt 
2,2 
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The other type of data member is an instance variable, which is one defined inside a 
method and belongs only to the current instance of the object. The bal ance mem¬ 
ber of Wal 1 et is an instance variable. So that you’re never confused as to what 
belongs to an object, you must use the sel f parameter to refer to its attributes 
whether they are methods or data members. 

Creating instance objects 

To create an instance of a class, you “call” the class and pass in whatever argu- 
ments its_ i ni t _method accepts, and you access the objecfs attributes using 


the dot 

operator: 


>>> 

w = Wal1et(50 

.00) 

>>> 

w.getPaiddOO 

.00) 

New 

balance $150. 

00 

>>> 

w. spend(25.0) 


New 

balance $125. 

00 

>>> 

w.balance 


125. 

0 



An instance of a class uses a dictionary (named_ di ct _) to hold the attributes 

and values specific to that instance. Thus object.attribute is the same as 

obj ect._di ct_[ 'attribute' ]. Additionally, each object and class has a few 

other special members: 

>>> Wallet._name_ # Class name 

'Wal1 et' 

>>> Wallet._module_ # Module in which class was defined 

'_mai n_' 

>>> w._class_ # Class definition for this object 

<class _main_.Wallet at OIOCICFO 

>>> w._doc_ # Doc string 

'Where does my money go?' 


More on accessing attributes 

You can add, remove, or modify attributes of classes and objects at any time: 

>>> w.owner = 'Dave' # Add an 'owner' attribute. 

>>> w.owner = 'Bob' # Bob stole my wallet. 

>>> dei w.owner # Remove the 'owner' attribute. 

Modifying a class definition affects ali instances of that class: 

>>> Wallet.color = 'blue' # Add a class variable. 

>>> w.coior 
'blue' 


102 Part I > The Python Language 


Note that when an instance modifies a class variable without naming the class, it’s 
really only creating a new instance attribute and modifying it: 

>>> w.color = 'red' # You might think you're changing the 
>>> Wallet.color # class variable, but you're not! 

’blue' 

Because you can modify a class instance at any time, a class is a great way to 
mimic a more flexible version of a C struet: 

class myStruct: pass 
z = myStructe) 
z.whatever = 'howdy' 

Instead of using the normal statements to access attributes, you can 

use the getattrCobj, natne[, default]),hasattr(obj,natne), 
setattrCobj ,narrie,value), and delattr(obj, name) functions: 

>>> hasattr(w,'coior') # Does w.color exist? 

1 

>>> getattr(w,'coior' ) # Return w.color please. 

’ red' 

>>> setattr(w,’size ' , 10) # Same as 'w.size = 10'. 

>>> dei attr(w,'coior') # Same as 'dei w.color'. 

As with functions, methods can also have data attributes. The method of the follow- 
ing class, for example, includes an HTML docstring for use with a Web browser- 
based class browser: 

>>> class SomeClass: 

def deieteFi1 es(self, mask): 
os .destroyFi 1 es(mask) 

deieteFi1 es.htmldoc = '<bold>Use with care!</bold>' 

>>> hasattr(SomeClass.deleteFiles,'htmldoc') 

1 

>>> SomeClass.dei eteFiles.htmldoc 
'<bold>Use with care!</bold>' 

You can read more about function attributes in Chapter 6. 


New A Method attributes are new in Python 2.1. 

Deriving New Classes from Other Classes 

Instead of starting from scrateh, you can create a class by deriving it from a pre- 
existing class by listing the parent class in parentheses after the new class name: 
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>>> class GetAwayVehicle: 
topSpeed = 200 
def engageStnokeScreen(sel f): 

print ' <Cough!>' 
def fire(self): 
print ' Bang ! ' 

>>> class SuperMotorcycle(GetAwayVehicle): 
topSpeed = 250 
def engageOi1 SI ick(self): 

print 'Enemies destroyed.' 
def fire(self): 

GetAwayVehicle.fi re(self) # Use method in parent, 
print'Kapow! ' 

The child class (SuperMotorcycl e) inherits the attributes of its parent class 
(GetAwayVehi cl e), and you can use those attributes as if they were defined in tbe 
cbild class: 

>>> tnyBike = SuperMotorcycl e() 

>>> tnyBi ke. engageSmokeScreen () 

<Cough!> 

>>> tnyBi ke. engageOi 1 SI i ck() 

Enemies destroyed. 

A cbild class can override data members and methods from tbe parent. For 
example, the value of topSpeed in child overrldes the one in the parent: 

>>> myBike.topSpeed 
250 

The fi re method doesn’t just override the original version in the parent, but it also 
calls tbe parent version too: 

>>> myBike.fi re() 

Bang! 

Kapow! 


Multiple inheritance 

When deriving a new cbild class, you aren’t limited to a single parent class: 

>>> class Glider: 

def extendWings(self): 

print’Wings ready!' 
def fire(self): 

print'Bombs away!' 

>>> class FlyingBike(G1i der,SuperMotorcycle): 
pass 
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In this case a Flyi ngBi ke enjoys ali the benefits of being both a G1 ider and a 
SuperMotorcycl e (which is also a GetAwayVehi cie). When searching for an 
attribute not defined in a child class, Python does a left-to-right, depth-first search 
on the base classes until it finds a match. If you fi re with a FI yi ngBi ke, it drops 
bombs, because first and foremost, it’s a G1 i der: 

>>> betterBike = FlyingBikeC) 

>>> betterBike.fi re() 

Bombs away! 

You can get a list of base classes using the_ bases _member of the class 

definition object: 

>>> for base in FlyingBike._bases_: 

print base 

_main .Glider # main is the module in 

_main .SuperMotorcycl e # which you defined the class. 

Tip Just because multiple inheritance lets you have child classes with many parents 

(and other strange class genealogies) doesn't always mean it's a good idea. If your 
design calls for more than a few direct parent classes, chances are you need a new 
design. 

Multiple inheritance really shines with mix-ins, which are small classes that over- 
ride a portion of another class to customize behavior. The SocketServer module, 
for example, defines a generic TCP Socket server class called TCPServer that han- 
dles a single connection at a time. The module also provides several mix-ins, includ- 
ing ForkingMixIn and ThreadingMixIn that provide their own process_request 
method. This lets the TCPServer code remain simple while making it easy to create 
multi-threaded or multi-process Socket server classes: 

class ThreadingServer(ThreadingMixln, TCPServer): pass 
class ForkingServer(ForkingMixIn, TCPServer): pass 

Furthermore, you can use the same threading and forking code to create other 
types of servers: 

class ThreadingUDPServerCThreadingMixIn , UDPServer): pass 

See Chapter 15 for Information on networking and socket servers. 


Creating a custom list class 

The UserLi st class (in the UserLi st module) provides a listlike base class that 
you can extend to suit your needs. User Li st accepts a list to use as an initializer, 
and internally you can access the actual Python list via the data member. The fol- 
lowing example creates an object that behaves like an ordinary list except that it 
also provides a method to randomly reorder the items in the list: 
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>>> import UserList, whrandotn 
>>> from whrandotn import randint 
>>> class MangleList(UserList.UserList): 
def mangle(self): 
data = self.data 
count = 1 en(data) 
for i in range(count): 

data.insert(randint(0,count-l),data.pop()) 

>>> z = MangleList([1,2,3,4,5]) 

>>> z.mangle() ; print z 
[1, 3, 5, 4, 2] 

>>> z.mangle() ; print z 
[5, 4, 1, 2, 3] 

Creating a custom string class 

You can also create your own custom string behaviors using the UserString class 
in the UserStri ng module. As wlth UserLi sts and lists, a UserStri ng looks and 
acts a lot like a normal string object: 

>>> from UserString import * 

>>> s = UserString('Goal !') 

>>> s.data # Access the underlying Python string. 

'Goal ! 

>>> s 
'Goal ! ' 

>>> s.upperC) 

'GOAL! ' 

»> s[2] 

' a' 

Of course, the whole point of having the UserString class is so thatyou can sub- 
class it. As an example, the UserStri ng module also provides the Mutabl eStri ng 
class: 

>>> m = MutableString('2 + 2 is 5') 

>>> m 

'2 + 2 is 5' 

>>> m[9] = '4' 

>>> m 

'2 + 2 is 4' 

Mutabl eStri ng does its magic by overriding (among other things) the 

_ setitem _ method, which is a speciai method Python calls to handie the 

index-based assignment in the example above. We cover_ setitem _ and 

other speciai methods in the "Overloading Standard Behaviors" section later in 
this chapter. 
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Creating a custom dictionary class 

And finally, Python also has the UserDi ct class in the UserDi ct module so that 
you can create your own subclasses of dictionaries: 

>>> frotn UserDict itnport * 

>>> d = UserDict({1:'one',2:'two',3:'three' )) 

>>> d 

{3: 'three', 2: 'two', 1: 'one') 

>>> d.data 

{3: 'three', 2: 'two', 1: 'one') 

>>> d.has_key(3) 

1 

The following example creates a dictionary object that, instead of raising an excep- 
tion, returns None if you try to use a nonexistent key: 

>>> frotn UserDict itnport * 

>>> class NoFai1 Dict(UserDict): 

def _getitem_(self,key): 

try: 

value = se)f.data[key] 
except KeyError: 

value = None 
return value 

>>> q = NoFai1 Dict({'orange':'0xFF6432','yel1ow':'OxFFFFOO')) 

>>> print q['orange'] 

0XFF6432 

>>> print q['blue'] 

None 


Hiding Private Data 

In other object-oriented languages such as C++ or Java, an objecfs attributes may 
or may not be visible outside the class definition (you can say a member is public, 
private, or protected). Such conventions help keep the implementation details hid- 
den and force you to work with objects through well-defined interfaces. 

Python, however, takes more of a minimalist approach and assumes you know what 
you’re doing when you try to access attributes of an object. Python programs usu- 
ally have smaller and more straightforward implementations than their C++ or Java 
counterparts, so private data members aren’t as useful or necessary (although if 
you’re accustomed to using them you may feel a little “overexposed” for awhile). 

Having said that, there stili may come a time when you really don’t want users of an 
object to have access to the implementation, or maybe you have some members in 
a base class that you don’t want children classes to access. For these cases, you 
can name attributes with a double underscore prefix, and those attributes will not 
be directly visible to outsiders: 
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>>> class FooCounter: 

_secretCount = 0 

def fooCself): 

self._secretCount += 1 

print self._secretCount 

>>> foo = FooCounter() 

>>> foo.fooC) 

1 

>>> foo.fooC) 

2 

>>> foo._secretCount 

Traceback (innermost last): 

File "<interactive input>", line 1, in ? 

AttributeError: 'FooCounter' instance has no attribute 
'_secretCount' 

Python protects those members by internally changing the name to include the 
class name. You can be sneaky and thwart this convention (valid reasons for 
doing this are rare!) by referring to the attribute using its mangled name: 

_className_attrName: 

>>> f 00 ._FooCounter_secretCount 

2 


Identifying Class Membership 

Class definitions and instance objects each have their own data type: 

>>> class Tree: 
pass 

>>> cl ass Oakdree): 
pass 

>>> seedling = Oak() 

>>> type(seedling); type(Oak) 

<type 'instance'> 

<type 'class'> 

Refer to Chapter 4 for more on identifying the data types of an object. 


Because the type is instance or class, all class definitions have the same type and 
all instance objects have the same type. If you want to see if an object is an instance 
of a particular class, you can use the isinstance(obj ,class) function: 

>>> i sinstancefseedling,Oak) 

1 

>>> i sinstancefseedling,Tree) # True because an Oak is a Tree. 

1 

The i ssubcl ass (cl ass , cl ass ) checks to see if one class is a descendent of 
another: 



108 Part I > The Python Language 


>>> issubclass(Oak,Tree) 

1 

>>> i ssubcl assdree ,0ak) 

0 

You can also retrieve the string name for a class using the na me member: 

>>> seedling._class_._name_ 

'Oak' 

>>> seedling._class_._bases_[0]._name_ 

'T ree' 

Your programs will often be more flexible if, instead of depending on an objecfs 
type or class, they check to see if an object has a needed attribute. This enables 
you and others to use your code with data types that you didn't necessarily con- 
sider when you wrote it. For example, instead of checking to see if an object 
passed in is a file before you write to it, just check for a wri te method, and if pre- 
sent, use it. Later you may find it usefui to call the same routine passing in some 
other object that also has a write method. "Using Filelike Objects" in Chapter 8 
covers this theme in more detail. 


Overloading Standard Behaviors 

Suppose youVe created a Vector class to represent two-dimensional vectors. What 
happens when you use the plus operator to add them? Most Ilkely Python will yell 

at you. You could, however, define the_add_method in your class to perform 

vector addition, and then the plus operator would behave: 

>>> class Vector: 


def 

_i n i t_ 

_(self,a,b): 



self.a 

= a 



self.b 

= b 


def 

_str_ 

Jself); 



return 

' Vector(%d,%d)’ 

% (self.a,self.b) 

def 

_add_ 

_(self,other): 



return Vector(self.a+other.a,sel f.b+other.b) 

>>> vl = Vector(2,10) 

>>> v2 = Vector(5,-2) 

>>> print vl + v2 
Vectori 7,8) 

Not only do users now have an intuitive way to add two vectors (much better than 
having them call some clunky function directly), but vectors also display them- 
selves nicely when converted to strings (thanks to the_ str _method). 

The operator module defines many functlons for which you can overload or define 
new behavior when used with your classes. The following sections describe these 
functions and how to use them. 
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Note that some functions have two or even three very similar versions. For exam- 

ple, in the numeric operators, you can create an_add_function, an_i add_ 

function, and an_ radd _function all for addition. The first is to implement nor- 

mal addition (x + y), the second for in-place addition (x += y), and the third for x + y 

when X does not have an_add_method (so Python calls y._radd(x) instead). If 

you don’t define in-place operator methods, Python checks for an overloaded ver- 

sion of the normal operator (for example, if you don’t define_i add_, x += y 

causes Python to stili call_add_if defined). For simplicity, it’s best to leave the 

in-place operators undefined unless your class in some way benefits from special 
in-place Processing (such as a huge matrix class that could save memory by per- 
forming addition in place). 

Overloading basic functionality 

Table 7-1 lists some generic functionality that you can override in your own classes. 


Table 7-1 

Base Overloading Methods 

Method 

Sample Call 

_i nit_(self[, args...]) 

obj = className(args) 

_dei_(sel f) 

dei obj 

_call_(self[, args...]) , cal 1 abi e function 

obj( 5 ) 

_getattr_(self, name) 

obj.f00 

_setattr_(self, name, value) 

obj.f00 = 5 

_delattr_(self, name) 

dei obj.f00 

_repr_(self) 

'obj' or repr(obj ) 

_str_(self) 

str(obj) 

_cmp_(sel f, x) 

cmp(obj,x) 

_lt_(self, X) 

obj < X 

_ 1 e_(self,x) 

obj <= X 

_eq_(sel f,x) 

obj == X 

_ne_(self,x) 

obj ! = X 

_gt_(sel f, x) 

obj > X 

_ge_(self,x) 

obj >= X 

_hash_(sel f) 

hash(obj) 

_nonzero_(self) 

nonzero(obj) 
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Note that with the dei statement, Python won’t call the_dei_method unless the 

objecfs reference count is finally 0. 

Python invokes the c a 11 method any time someone tries to treat an instance 

of your object as a function. Users can test for “callability” using the 
callable(obj) function, which tries to determine if the object is callable 
(cal 1 abi e may return true and be wrong, but if it returns false, the object really 
isn’t callable). 

Python calls the_getattr_function only after a search through the instance dic- 

tionary and base classes comes up empty-handed. Your implementation should 

return the desired attribute or raise an Attri buteError exception. If_ setattr _ 

needs to assign a value to an instance variable, be sure to assign it to the instance 

dlctlonary instead (sel f._di ct_[name] = val) to prevent a recursive call to 

_ setattr _. If your class has a_ setattr _method, Python always calls it to 

set member variable values, even if the instance dlctlonary already contains the 
variable being set. 

The hash and cmp functions are closely related: if you do not Implement_cmp_, 

you should not implement_ hash _. If you provide a_cmp_but no_ hash _, 

then instances of your object can’t act as dictionary keys (which is correct if your 
objects are mutable). Hash values are 32-bit integers, and two instances that are 
considered equal should also return the same hash value. 

The nonzero function performs truth value testing, so your implementation should 

return 0 or 1. If not implemented, Python looks for a_len_implementation to use, 

and if not found, then all instances of your object will be considered “true.” 

You use the_11_,_gt_, and other methods to implement support for rich 

comparisons where you have more complete control over how objects behave dur- 
ing different types of comparisons. If present, Python calls any of these methods 

before looking for a_cmp_method. The following example prints a message each 

time Python calls a comparison function so you can see what happens: 


>>> class Simple: 

def _cmp_(self, obj): 



print 

_cmp_’ 



return 1 




def _lt_ 

(self, obj ): 



print 

_lt_’ 



return 0 



>>> s 

= Simple() 



>>> s 

< 5 



It 

Python 

uses rich comparisons 

first. 

0 




>>> s 

> 5 



_cmp_ 

_ # Uses _ 

_cmp_ if there are no 

rich comparison methods 


1 
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Tip 


New . 
Feature 


Your rich comparison methods can return Notlmpletnentedto teli Python that you 
don’t want to handle a particular comparison. For example, the following class imple- 
ments an equality method that works on integers. If the object to which it is compar- 
ing isn’t an integer, it telis Python to figure out the comparison resuit on its own: 

>>> class Myint: 

def _init_(self, val): 

self.val = val 

def _eq_(self, obj ): 

pri nt '_eq_' 

if type(obj) != type(O): 
print ' Skipping' 
return Notlmplemented 
return self.val == obj 
>>> m = Mylnt(16) 

>>> m == 10 
—eq_ 

0 

>>> m == ’Hi' 

_eq_ 

Skipping 
0 


Although_ cmp _methods must return an integer to represent the resuit of the 

comparison, rich comparison methods can return data of any type or raise an 
exception if a particular comparison is invalid or meaningless. 

4^ . Rich comparisons are new in Python 2.1. 

Overloading numeric operators 

By overloading the numeric operators methods, your classes can correctly respond 
to operators like +, -, and so on. Note that Python calls the right-hand side version 

of operators (for example,_ radd _if the left-hand operator doesn’t have a corre- 

sponding method defined (_ add_ ^): 

>>> class Add: 

def _init_(self,val): 

self. val = val 

def _add_(self,obj): 

print 'add',obj 
return self.val + obj 

def _radd_(self,obj): 

print 'radd',obj 
return self.val + obj 
»> a = Add(lO) 

>>> a 

<_main_.Add instance at 00E5D354> 

>>> a + 5 # Calls a._add_(5). 
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add 5 
15 

>>> 5 + a # Calis a._radd_(5). 

radd 5 
15 

Table 7-2 lists the mathematic operations (and the right-hand and in-place variants) 
that you can overload and examples of how to invoke them. 


Table 7-2 

Numeric Operator Methods 


Method 


Sample Call 


_add_ 

(sel f, obj ) 

,_radd_, 

_iadd_ 

obj + 10.5 

_sub_ 

(sel f, obj ) 

,_rsub_, 

_i sub_ 

obj - 16 

_mul_ 

(self, obj) 

,_rmul_, 

_i mul_ 

obj * 5.1 

_d i V _ 

(sel f, obj ) 

,_rdi V _, 

_i d i V _ 

obj / 15 

_mod_ 

(self, obj) 

,_rmod_, 

_imod_ 

obj % 2 


_divmod (sel f, obj ), rdivmod 

_pow_(sel f, ob j [, tnodul 0]) , 

_rpow_(self,obj) 

_neg_(sel f) 

_pos_(sel f) 

_abs_(sel f) 

_invert_(sel f) 


divmod(obj,3) 
pow(obj,3) 

-obj 

+obj 

abs(obj) 

~obj 


Overloading sequence and dictionary operators 

If you create your own sequence or mapping data type, or if you just like those nifty 
bracket operators, you can overload the sequence operators with the methods 
listed in Table 7-3. 
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Table 7-3 

Sequence and Dictionary Operator Methods 

Method 

Sample Call 

_1 en_(sel f) 

1 en(obj) 

_getitem_(self, key) 

obj['cheese'] 

_setitem_(self, key, value) 

obj[5] = (2,5) 

_delitem_(self, key) 

dei obj['no'] 

_setsli ce_(self, i, j, sequence) 

obj[1:7] = 'Eel1ow' 

_dei sli ce_(self, i , j ) 

dei obj[5:7] 

_contains_(self,obj) 

X in obj 


This class overrides the slice operator to provide an inefficient way to create a list 
of numbers: 


>>> class DumbRange: 

def _getitem_(sel f,sl i ce): 

step = slice.step 
if step is None: 
step = 1 

return range(slice.start,slice.stop+1,step) 


>>> 

d 

= DumbRange() 

>>> 

d 

[2:5] 

[2, 

3 

, 4, 5] 

>>> 

d 

[2:10:2] # Extended 

[2, 

4 

, 6, 8, 10] 


The argument to_ geti tem _is either an integer or a slice object. Slice objects 

have stant, stop, and step attributes, so your class can support the extended slic- 
ing shown in the example. 

If the key passed to_ geti tem _is of the wrong type, your implementation should 

raise the TypeError exception, and the slice methods should reject invalid indices 
by raisingthe IndexError exception. 

If your g e t i t e m method raises IndexError on an invalid index, Python can 

iterate over object instances as if they were sequences. The followlng class behaves 
like a range object wlth a user-supplied step, but it limits Itself to only 6 iterations: 
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>>> class Stepper: 

def _init_(self, step): 

self.step = step 

def _getitem_(self, index): 

if index > 5: 

raise IndexError 
return self.step * index 
>>> s = StepperO) 

>>> for i in s: 
print i 

0 # Python calls _getitem_ with index=0 

3 

6 

9 

12 

15 # Python stops after a _getitem_call raises an exception 


Overloading bitwise operators 

The bitwise operators let your classes support operators such as << and xor: 

>>> class Vector2D: 

def _init_(self,i,j): 

s e 1 f. i = i 
self.j = j 

def _Ishift_(self.x): 

return Vector2D(self.i << x, self.j << x) 
def _repr_(self): 

return ' Vector2D(%s,%s)' % ('self. i','self.j') 
>>> vl = Vector2D(5,2) 

>>> vl << 2 
Vector2D(20,8) 

Table 7-4 lists the methods you detine to overload the bitwise operators. 


Table 7-4 

Bitwise Operator Methods 

Method 

Sample Call 

_1 shi ft_(self, ob j ),_rl shi ft_, 

_i 1shift_ 

obj << 3 

_rshi ft_(self, ob j ),_rrshi ft_, 

_irshift_ 

obj >> 1 

_and_(self, ob j ),_rand_,_i and_ 

obj & 17 

_or_(self, ob j ),_ror_,_i or_ 

obj 1 otherObj 

_xor_(self, ob j ),_rxor_,_i xor_ 

obj '' OxFE 
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Overloading type conversions 

By overloading type conversion methods, you can convert your object to different 


data types as needed: 


>>> 

class Five: 



def _int_ 

(self) 


return 

5 

>>> 

f = Five() 


>>> 

int ( f ) 



5 

Python calls these methods when you pass an object to one of the type conversion 
routines. Table 7-5 lists the methods, sample Python code that would invoke them, 
and sample output they might return. 


Table 7-5 

Type Conversion Methods 


Method 

Sample Call 

Sample Output 

_int_(self) 

int(obj ) 

53 

_1ong_(self) 

1ong(obj ) 

12L 

_f1oat_(self) 

float(obj) 

3.5 

_complex_(self) 

complex(obj ) 

2 + 3j 

_oct_(self) 

oct(obj ) 

'012' 

_hex_(self) 

hex(obj ) 

'OxFE' 


Python calls the_ coerce_(sel f, obj ) method, if present, to coerce two numer- 

ical types into a common type before applying an arithmetic operatiori. Your imple- 
mentation should return a 2-item tuple containing sel f and obj converted to a 
common numerical type or None if you don’t support that conversion. 


Using Weak Referentes 

Like many other high-level languages, Python uses a form of garbage collection to 
automatically destroy objects that are no longer in use. Each Python object has a 
reference count that tracks how many references to that object exist; when the ref- 
erence count is 0, then Python can safely destroy the object. 

While reference counting saves you quite a bit of error-prone memory management 
work, there can be times when you want a weak reference to an object, or a refer¬ 
ence that doesn’t prevent Python from garbage collecting the object if no other 
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references exist. With the weakref module, you can create weak references to 
objects, and Python will garbage collect an object if its reference count is 0 or if the 
only references that exist are weak references. 



The weakref module is new in Python 2.1. 


Creating weak references 


You create a weak reference by calling ref(obj[, callback])in the weakref 
module, where obj is the object to which you want a weak reference and cal 1 back 
is an optional function to call when Python is about to destroy the object because 
no strong references to it remain. The callback function takes a single argument, the 
weak reference object. 

Once you have a weak reference to an object, you can retrieve the referenced 
object by calling the weak reference. The following example creates a weak refer¬ 
ence to a Socket object: 

>>> ref = weakref.ref(a) 

>>> frotn Socket import * 

>>> import weakref 

>>> s = socket(AF_INET,SOCK_STREAM) 

>>> ref = weakref.ref(s) 

>>> s 

<socket._socketobject instance at 007B4A94> 

>>> ref 

<weakref at 0x81195c; to 'instance' at 0x7b4a94> 

>>> ref() # Call it to access the referenced object. 

<socket._socketobject instance at 007B4A94> 

Once there are no more references to an object, calling the weak reference returns 
None because Python has destroyed the object. 

Note Most objects are not accessible through weak references. 


The getweakrefcount(obj) and getweakrefs(obj) functions in the weakref 

module return the number of weak references and a list of referents for the given 
object. 

Weak references can be useful for creating caches of objects that are expensive to 
create. For example, suppose you are building a distributed application that sends 
messages between computers using connection-based network sockets. In order to 
reuse the Socket connections without keeping unused connections open, you 
decide to keep a cache of open connections: 
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import weakref 
frotn Socket import * 

socketCache = {) 
def getSocket(addr): 

'Returns an open socket object' 
if socketCache.has_key(addr): 
sock = socketCache[addr]() 
if sock: # Return the cached socket. 
return sock 

# No socket found, so create and cache a new one. 

sock = socket(AF_INET,SOCK_STREAM) 
sock.connecte addr) 

socketCache[addr] = weakref.ref(sock) 
return sock 

In order to send a message to a remote computer, your program calls getSocket to 
obtain a socket object. If a connection to the given address doesn’t already exist, 
getSocket creat es a new one and adds a weak reference to the cache. When ali 
strong references to a given socket are gone, Python destroys the socket object and 
the next request for the same connection will cause getSocket to create a new one. 

The mappi ng( [di ct[ ,weakkeys]]) function in the weakref module returns a 
weak dictionary (initializing it with the values from the optional dictionary di ct). If 
wea kkey s is 0 (the default), the dictionary automatically removes any entry whose 
value no longer has any strong references to it. If weakkeys is nonzero, the dictio¬ 
nary automatically removes entries whose keys no longer have strong references. 


Creating proxy objects 

Proxy objects are weak reference objects that behave like the object they reference 
so that you don’t have to first call the weak reference to access the underlying 
object. Create a proxy by calling weakrefs proxy (obj [, call back] ) function. 
You use the proxy object as if it was the actual object it references: 

>>> from socket import * 

>>> import weakref 

>>> s = socket(AF_INET,SOCK_STREAM) 

>>> ref = weakref.proxy(s) 

>>> s 

<socket._socketobject instance at 007E4874> 

>>> ref # It looks like the socket object. 

<socket._socketobject instance at 007E4874> 

>>> ref.closef) # The objecfs methods work too. 
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The cal 1 back parameter has the same purpose as the ref function. After 
Python deletes the referenced object, using the proxy results in a 

weakref.ReferenceError: 



>>> dei s 
>>> ref 

Traceback (most recent call last): 

File "<stdin>", line 1, in ? 

This example assumes that Python immediately destroys the object once the last 
string is gone. While true of the current garbage collector implementation, future 
versions may be different. 


Summary 

Python fully supports object-oriented programming while requiring minimal effort 
from you, the programmer. In this chapter you: 

Created your own custom classes. 

-f Derived new classes from other classes. 

-f Extended built-in data types like strings and lists. 

-f Defined custom behaviors for operations on your classes. 

In the next chapter you learn to create programs that interact with the user and 
store and retrieve data. 


Input and 
Output 


c 


R 


I n order to be useful, most programs must internet with the 
“outside World” in some way. This chapter introduces 
Python’s functions for reading and writing files, printing to the 
screen, and retrievlng keyboard input from the user. 


Printing to the Screen 

The simplest way to produce output is using the pri nt state- 
ment, which converts the expressions you pass it to a string 
and writes the resuit to Standard output, which by default is 
the screen or console. You can pass in zero or more expres¬ 
sions, separated by commas, between which pri nt inserts a 
space: 

>>> print 'It i s ' , 5-(-7 , ’ past' , 3 
It is 12 past 3 

Before printing each expression, print converts any non- 
string expressions using the str function. If you don’t want 
the spaces between expressions, you can do the conversions 
yourself: 



> ♦ ♦ ♦ 
In This Chapter 

Printing to the screen 

Accessing keyboard 
input 

Opening, closing, 
and positioning files 

Writing files 

Reading files 

Accessi ng Standard 
1/0 

Using filelike objects 

> ♦ ♦ ♦ 


>>> a = 5.1; z = (0,5,10) 

»> print ’(%0.1f + %0.1f) = \n%0.1f' % 

(a,a,a*2) 

(5.1 + 5.1) = 

10.2 

>>> print 'Move to '+str(z) 

Move to (0, 5, 10) 

>>> print 'Two plus ten is '-(-'2-1-10' # " is 
the same as repr. 

Two plus ten is 12 


< Cross- \ 
Reference' 


Chapter 3 covers converting different data types to strings. 
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If you append a trailing comma to the end of the statement, pri nt won’t move to 
the next line: 


>>> def addEm(x,y): 
print X, 
print 'plus', 
print y, 
print ’is’, 
print x+y 
>>> addEtn(5,2) 

5 plus 2 is 7 


Python uses the softspace attribute of stdout (stdout is in the sys module) to 
track whether it needs to output a space before the next item to be printed. You can 
use this feature to manually shut off the space that would normally appear due to 
using a comma: 


>>> import sys 
>>> def j oi nErri( a , b): 
print a, 

sys.stdout.softspace = 0 
print b 

>>> j oi nErri( ' Thanks ' , ' gi vi ng ' ) 
Thanksgiving 


An extended form of the p r i n t statement lets you redirect output to a file instead 
of Standard output: 

>>> print >> sys.stderr ,"File not found" 

File not found 

New The extended form of pri nt was introduced in Python 2.0. 

Feature 


Any filelike object will do, as you will see in the “Using Filelike Objects” section later 
in this chapter. 


Accessing Keyboard Input 

Going the other direction, Python provides two built-in functions to retrieve a line 
of text from Standard input, which by default comes from tbe user’s keyboard. The 
examples in this section use italics for text you enter in response to the prompts. 

rawjnput 

The raw_i nput( [prompt] ) function reads one line from Standard input and 
returns it as a string (removing the trailing newline): 





Chapter 8 4- Input and Output 121 


>>> s = raw_input() 

Unde Gomez 
>>> print s 
Unde Gomez 

You can also specify a prompt for raw_i nput to use while waiting for user input: 

>>> s = raw_input(’Command: ') 

Command: launch missiles 

>>> print 'Ignoring command to’,s 

Ignoring command to launch missiles 

If raw_i nput encounters the end of file, it raises the EOFError exception. 


input 

The input([prompt]) function is equivalent to raw_i nput, except that it assumes 
the input is a valid Python expression and returns the evaluated resuit to you: 

>>> input('Enter some Python: ') 

Enter some Python: [x*5 for x in range(2,10,2)J 
[10, 20, 30, 40] 


Note that i nput isn’t at ali error-proof. If the expression passed in is bogus, input 
raises the appropriate exception, so be wary when using this function in your 
programs. 



Chapter 38 covers the readl i ne module for UNIX systems. If enabled, this mod¬ 
ule adds command history tracking and completion to these input routines (and 
Python's interactive mode as well). 


Cross- A You may have noticed that you can't read one character at a time (instead you 
Referenc^ have to wait untii the user hits Enter). To read a singie character on UNIX systems 

■- (or any system with curses support), you can use the getch function in the 

curses module (Chapter 22). For Windows systems, you can use the getch func¬ 
tion in the msvcrt module (Chapter 37). 


Opening, Closing, and Positioning Files 

The remaining sections in this chapter show you how to use files in your programs. 


• Cross- 
ReferenceA 


Part II of this book —"Files, Data Storage, and Operating System Services" — covers 
many additional features you'll find usefui when using files. 


122 Part I > The Python Language 


open 

Before you can read or write a file, you have to open it using Python’s built-in 

open(name[, mode[, bufsize]] )function: 

>>> f = open('f 00 .txt',’wt',1) # Open foo.txt for writing. 

>>> f 

<open file 'foo.txt', mode 'wt' at 010C0488> 

The mode parameter is a string (similar to the mode parameter in C’s fopen 
function) and is explained in Table 8-1. 



Table 8-1 

Mode Values for open 

Value 

Description 

R 

Opens for reading. 

W 

Creates a file for writing, destroying any previous file with the 
same name. 

A 

Opens for appending to the end of the file, creating a new one if 
it does not aiready exist. 

r-i- 

Opens for reading and writing (the file must aiready exist). 

w+ 

Creates a new file for reading and writing, destroying any 
previous file with the same name. 

a+ 

Opens for reading and appending to the end of the file, creating 
a new file if it does not aiready exist 


If you do not specify a mode string, open uses the default of ' r '. To the end of the 
mode string you can append a ‘t’ to open the file in text mode or a ‘b’ to open it in 
binary mode: 

>>> f = open('somepic .j pg ','w+b') # Open/create binary file. 

If you omit the optional buffer size parameter (or pass in a negative number), open 
uses the system’s default buffering. A value of 0 is for unbuffered reading and writing, 
a value of 1 buffers data a line at a time, and any other number telis open to use a 
buffer of that size (some Systems round the number down to the nearest power of 2). 

If for any reason the function call fails (file doesnT exist, you don’t have permis- 
sion), open raises the lOError exception. 

Cross- A The os module (Chapter 10) provides the fdopen, popen, popen2, and popen3 
Ref erence'Y functions as additional ways to obtain file objects. You can also create a filelike object 
wrapping an open socket with the Socket .makef i 1 e function (Chapter 15). 
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File object information 

Once you have a file object, you can use the name, fileno(), isatty( ),rriode, and 
cl osed methods and attributes to get different information about tbe objecfs 
status: 

>>> f = open('f 00 .txt',’wt' ) 

>>> f.mode # Get the mode used to create the file object. 

' wt' 

>>> f.closed # Boolean: has the file been closed already? 

0 

>>> f.name # Get the name of the file. 

'foo.txt' 

>>> f.isattyC) # Is the file connected to a terminal? 

0 

>>> f.filenoC) # Get the file descriptor number. 

0 


Cross- \ with the file descriptor returned by the fi leno method you can call read and 
Referen^ other functions in the os module (Chapter 10). 


close 

The cl ose () method of a file object flushes any unwritten information and closes 
the file object, after which no more wrlting can occur: 

>>> f = open('foo.txtwt' ) 

>>> f.write('Foo! ! ’ ) 

>>> f.closeC) 


File position 

The teli () method telis you the current position within the file (in other words, 
the next read or write will occur at that many bytes from the beginning of the file): 

>>> f = open('tel 1 .txt’,'w+' ) # Open for reading AND writing. 

>>> f.write('BRAWN' ) # Write 5 characters. 

>>> f.tel1() 

5 # Next operation will occur at offset 5 (starting from 0). 

TheseekCoffsetf, from]) method changes the current file position. The follow- 
ing example continues the previous one by seeking to an earlier point in the file, 
overwriting some of the previous data, and then reading the entire file: 

>>> f.seek(2) # Move to offset 2 from the start of the file. 

>>> f.write('AI’) 

>>> f.seek(O) # Now move back to the beginning. 

>>> f.readC) # Read everything from here on. 

'BRAIN' 
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You can pass an additional argument to seek to change how it interprets the first 
parameter. Use a value of 0 (which is the default) to seek from the beginning of the 
file, 1 to seek relative to the current position, and 2 to seek relative to the end of the 
file. Using the previous example: 

>>> f.seek(-4,2) # Seek 4 bytes back from the end of the file. 

>>> f.read() 

’RAIN' 

Caution When you open a file in text mode on a Microsoft Windows system, Windows 
silently and automatically translates newline characters ('\n') into 'V\n' instead. In 
such cases use seek oniy with an offset of 0 (to seek to the beginning or the end 
of the file) or to seek from the beginning of the file with an offset returned from a 
previous call to tel 1. 


Writing Files 

The wri te ( str ) method writes any string to an open file (keep in mind that 
Python strings can have binary data and not just text). Notice that wri te does not 
add a newline character (‘\n’) to the end of the string: 

>>> f = open('snow.txt','w+t') 

>>> f.write('Once there was a snowman, \nsnowman , snowman.Xn') 

>>> f.seek(O) # Move to the beginning of the file. 

>>> print f.read() 

Once there was a snowman, 
snowman, snowman. 

The wri tel i nes (1 i st ) method takes a list of strings to write to a file (as with 
wri te, it does not append newline characters to the end of each string you pass 
in). Continuing the previous example: 

>>> lines = ['Once there was a snowman ',’tall, tal1 , ’ , ' tal1 ! ’ ] 
>>> f.writeli nes(1 i nes) 

>>> f.seek(O) 

>>> print f.read() 

Once there was a snowman, 
snowman, snowman. 

Once there was a snowman tali, tali, tali! 

Tip Like stdout, all file objects have a softspace attribute (covered in the first sec- 

^ tion of this chapter) telling whether or not Python shouid insert a space before 
writing out the next piece of data. As with stdout, you can modify this attribute to 
shut off that extra space. 

The truncate([offset]) method deletes the contents of the file from the current 
position until the end of the file: 
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>>> f.seek(lO) 

>>> f.truncate() 
>>> f.seek(O) 

>>> print f.read() 
Once there 


Optionally you can specify a file position at which to truncate instead of the current 
file position: 


>>> f.seek(O) 

>>> f.truncate(5) 
>>> print f.read() 
Once 


You can also use the f 1 ush () method to commit any buffered writes to dlsk. 



See the pi ckl e, shel ve, and struet modules in Chapter 12 for Information on 
writing Python objects to files in such a way that you can later read them back in 
as valid objects. 


Reading Files 

The read([count]) method returns the specified number of bytes from a file (or 
less if it reaches the end of the file): 

>>> f = open('read.txt’,'w+t') # Create a file. 

>>> for i in rangeO): 

f.write('Line #%d\n' % i) 

>>> f.seek(O) 

>>> f.readO) # Read 3 bytes from the file. 

’ Li n' 

If you don’t ask for a specific number of bytes, read returns the remainder of the file: 

>>> print f.read() 
e #0 
Line #1 
Line #2 

The readline([count]) method returns a single line, including the trailing new- 
line character if present: 


>>> f.seek(O) 

>>> f.readline() 
'Line #0\012' 
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You can have readl i ne return a certain number of bytes or an entire line 
(whichever comes first) by passing in a size argument: 

>>> f.readline(5) 

' Li ne ' 

>>> f.readline() 

’#1\012' 

The readl i nes ([si zehi nt]) method repeatedly calls readl i ne and returns a list 
of lines read: 

>>> f.seek(O) 

>>> f.readli nes() 

['Line #0\012', 'Line #1\012', ' Line #2\012’] 

Once they reach the end of the file, the read and readl i ne methods return 
empty strings, and the readl i nes method returns an empty list. 

The optional si zehi nt parameter limits how much data readl i nes reads into 
memory instead of reading until the end of the file. 

When you’re Processing the lines of text in a file, you often want to remove the new- 
line characters along with any leading or trailing whitespace. Here’s an easy way to 
open the file, read the lines, and remove the newlines ali in a single step (this exam- 
ple assumes you have the read.txt file from above): 

>>> [x.stripC) for x in open('read.txt’).readli nes() ] 

['Line #0', 'Line #1', 'Line #2'] # Yay, Python! 

One drawback to the readl i nes method is that it reads the entire file into memory 
before returning it to you as a list (unless you supply a s i zeh i nts parameter, in 
which case you have to call readl i nes over and over again until the end of the 
file). The xreadl i nes works like readl i nes but reads data into memory as 
needed: 

>>> for line in open('read.txt').xreadli nes(): 

print 1 i ne.strip().upper() # Print uppercase version of 

1 i nes. 

New The xreadl i nes function is new in Python 2.1. 

Feature 


Accessing Standard 1/0 

The sys module provides three file objects that you can always use: stdi n 
(Standard input), stdout (Standard output), and stderr (Standard error). Most 
often stdi n holds input coming from the user’s keyboard while stdout and stderr 
print messages to the user’s screen. 
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Some IDEs like PythonWin implement their own version of stdin, stdout, 
i nput, and so on, so redirecting them may behave differently. When in doubt, try 
it out from the command line. 


Routines like i nput and raw_i nput read from stdi n, and routines like pri nt write 
to stdout, so an easy way to redirect input and output is to put file objects of your 
own into sys .stdin and sys . stdout: 

>>> itnport sys 

>>> sys.stdout = open('fakeout.txt','wt') 

>>> print "Now who's going to the restaurant?" 

>>> sys.stdout.close() 

>>> sys.stdout = sys._stdout_ 

>>> open('fakeout.txt’).read() 

"Now who's going to the restaurant?\012" 

As the example shows, the original values are in the _stdi n_,_stdout_ , and 

_ stderr _members of sys; be a good Pythonista and point the variables to their 

original values when you’re done fiddling wlth them. 

/Note External programs started via os. System or os.popen do not look in 

' sys. stdin or sys. stdout. As a resuit, their input and output come from the 

normal sources, regardiess of changes youVe made to Python's idea of stdi n and 
stdout. 


Using Filelike Objects 

One of the great features of Python is its flexibility with data types, and a neat 
example of this is with file objects. Many functions or methods require that you 
pass in a file object, but more often than not you can get away with passing in an 
object that acts like a file instead. 

The following example implements a filelike object that reverses the order of any- 
thing you write to it and then sends it to the original version of stdout: 

>>> import sys.string 
>>> class Reverse: 

def write(self,s): 
s = 1 i s t (s) 
s.reverse() 

sys._stdout_.write(string . join(s , ' ' )) 

sys._stdout_.flushC) 

Not much of a file object is it? But, you’d be surprised at how often it’ll do the trick: 

>>> sys.stdout = Reversef) 

>>> print 'Python rocks!' 

Iskcor nohtyP 
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Detecting Redirected Input 


Suppose you're writing a nifty utility program that wouid most often be used in a script 
where the input wouid come from piped or redirected input, but you also want to provide 
more of an interactive mode for other uses. Instead of having to pass in a command line 
parameter to choose the mode, your program couid use the i satty method of sys.stdin to 
detect it for you. 

To see this in action, save this tiny program to a file called myutil.py; 

import sys 

if sys.stdin.isatty(): 

print 'Interactive mode!' 
el se: 

print 'Batch mode I' 

Now run it from an MS-DOS or UNIX shell command prompt: 

C:\temp>python myutil.py 
Interactive mode! 

Run it again, this time redirecting a file to stdin using the redirection character (any file 
Works as input —in the example below I chose myutil.py because you're sure to have it in 
your directory): 

C:\temp>python myutil.py < myutil.py 
Batch mode! 

Likewise, a more complex (and hopefully more usefui) utility couid automatically behave 
differently depending on whether a person or a file was supplying the input. 


In fact, you can trick most of Python into using your new file object, even when 
printing error messages: 

>>> sys.stderr = Reverse!) 

>>> Reverse.foo # This action causes an error. 

:)tsal Ilac tnecer tsom( kcabecarT 
? ni ,1 enil ,">nidts<" eliF rorrEetubirttA :oof 

The point here is that no part of the Python interpreter or the Standard libraries 
has any knowledge of your special file class, nor does it need to. Sometimes a cus- 
tom class can act like one of a different type even if it’s not derived from a common 
base class (that is, files and Reverse do not share some common “generic file” 
superclass). 

One instance in which this feature is usefui is when you’re building GUI-based 
applications (see Chapter 19) and you want text messages to go to a graphical 
window instead of to the console. Just write your own filelike class that sends a 
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string to the window, replace sys . stdout (and probably sys . stderr), and 
magically output goes to the right place, even if some third-party module that is 
completely ignorant of your trickery generates the output. 

This flexibility comes in handy at other times too. For example, map lets you pass in 
the function to apply. The ability to recognize cases where it is both useful and intu- 
itive is a talent worth cultivating. 

Tip As of Python 2.1, you can create a xreadlines object around any filelike object 

^ that implements a readl i nes method: 

import xreadli nes 

obj = SomeFi1eLikeObject() 

for line in xreadli nes.xreadl i nes(obj): 

... do some work ... 


Summary 

Whether you’re using files or Standard 1/0, Python makes handling input and output 
easy. in this chapter you: 

4 Printed Information to the user’s console. 

4 Retrieved input from the keyboard. 

4 Learned to read and write text and binary files. 

4- Used filelike objects in place of actual file objects. 

in the next chapter you’ll learn to use Python’s powerful string handling features. 
With them you can easily search strings, match patterns, and manipulate strings in 
your programs. 
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S trings are a very common format for data display, input, 
and output. Python has several modules for manipulat- 
ing strings. The most powerful of these is the regular expres- 
sion module. Python also offers classes that can hlur the 
separation between a string (in memory) and a file (on disk). 

This chapter covers all of the things you can do with strings, 
ordered from the crucial to the seldom used. 


Using String Objects 

String ohjects provide methods to search, edit, and format the 
string. Because strings are immutable, these functions do not 
alter the original string. They return a new string: 

>>> bob="hi there" 

>>> bob.upper() # Say it LOUDER! 

'HI THERE' 

>>> bob # bob is immutable, so he didn't 
mutate. 

>>> 'hi there' 

>>> string.upper(bob ) # Module function, same 
as bob.upper 
'HI THERE' 

String object methods are also available (except as noted 
below) as functions in the string module. The corresponding 
module functions take, as an extra first parameter, the string 
object to operate on. 
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Cross- ^ See Chapter 3 for an introduction to string syntax and formatting in Python. 
ReferenceA 


String formatting methods 


Several methods are available to format strings for printing or processing. You can 
justify the string within a column, strip whitespace, or expand tabs.ljust(u;/c/f/?), 
centerQwidth), or rjust(£nfc/f/7). These methods right-justify, center, or left-justify a 
string within a column of a given width. They pad the string with spaces as neces- 
sary. If the string is longer than width, these methods return the original string. 

This kind of formatting works in a monospaced font, such as Courier New, where ali 
characters have the same width. In a proportional font, strings with the same length 
generally have different widths on the screen or printed page. 

>>> "antici".1just(10)+"pation".rjust(10) 

'antici pation' 

Istrip, rstrip, strip 

Istrip returns a string with leading whitespace removed, rstrip removes trailing 
whitespace, and strip removes both. “Whitespace” characters are defined in 
string.whitespace—whitespace characters include spaces, tabs, and newlines. 

>>> " hello World ".Istripf) 

’hei 1 0 world ' 

>>> _.rstrip() # Interpreter trick: „ = last expression value 
'hei 1 0 world' 

expandtab([tabsize]) 

This method replaces the tab characters in a string with tabsize spaces, and returns 
the resuit. The parameter tabsize is optional, defaulting to eight. This method is 
equivalent to repi ace (" \t"," "*tabsi ze). 


String case-changing methods 


You can convert strings to UPPERCASE, lowercase, and more. 

lower, upper 

These methods return a string with all characters shifted to lowercase and 
uppercase, respectively. They are useful for comparing strings when case is not 
important. 


capitalize, titie, swapcase 

The method capitalize returns a string with the first character shifted to uppercase. 
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The method title returns a string converted to “titlecase.” Titlecase is similar to the 
way book tities are written: it places the first letter of each word in uppercase, and 
all other letters in lowercase. Python assumes that any group of adjacent letters 
constltutes one word. 

The method swapcase returns a string where all lowercase characters changed to 
uppercase, and vice versa. 

>>> "hello world".title() 

’Hei 1 0 World' 

>>> "hello world".capitalize() 

'Hei 1 0 world' 

>>> "hello world".upper() 

'HELLO WORLD' 

String format tests (the is-methods) 

These methods do not have corresponding functions in the string module. Each 
returns false for an empty string. For instance, "". i s a 1 p h a () returns 0. 

4 isalpha —Returns true if each character is alphabetic. Alphabetic characters 
are those in string.letters. Returns false otherwise. 

4 isalnum — Returns true if each character is alphanumeric. Alphanumerlc 
characters are those in string.letters or string.digits. Returns false otherwise. 

4 isdigit —Returns true if each character is a digit (from string.digits). Returns 
false otherwise. 

4 isspace — Returns true if each character is whitespace (from string. 
whitespace). Returns false otherwise. 

4 islower —Returns true if each letter in the string is lowercase, and the string 
contains at least one letter. Returns false otherwise. For example: 

>>> "2 + 2".islower() # No letters, so test fails! 

0 

>>> "2 plus 2".islower() # A-ok! 

1 

4 isupper —Returns true if each letter in the string is uppercase, and the string 
contains at least one letter. Returns false otherwise. 

4 istitle — Returns true if the letters of the string are in titlecase, and the string 
contains at least one letter. Returns false otherwise. (See the title formatting 
method discussed previously for a descriptlon of titlecase.) 

String searching methods 

Strings offer various methods for simple searching. For more powerful searching, 
use the regular expressions module (covered later in this chapter). 
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find(substring[, firstindex[, lastindex]]) 

Search for substring within the string. If found, return the index where the first 
occurrence starts. If not found, return -1. 

A call to str. f i nd searches the slice str[firstindex:lastindex].So, the 
default behavior is to search the whole string, but you can pass values for firstindex 
and lastindex to limit the search. 

>>> str="the rest of the story" 

>>> str . find("the") 

0 

>>> str.f i nd( "the", 1) # Start search at index 1. 

12 

>>> str . find("futplex") 

-1 

Here are some relatives of f i nd, which you may find useful: 

♦ index — Same syntaxand effect as find, but raises the exception ValueError 
if it doesn’t find the substring . 

♦ rfind — Same as find, but returns the index of the last occurrence of the 
substring. 

♦ rindex — Same as i ndex, but returns the index of the last occurrence of the 
substring. 

startswith(substr[,firstindex[,lastindex]]) 

Returns true if the string starts with substr. Acalltostr.startswith compares 
sutor against the slice strffirsti ndex: lastindex]. You can pass values for 
firstindex and lastindex to test whether a slice of your string with substr. No equiva- 
lent function in the string module. 


endswith(substr[,firstindex[,lastindex]]) 

Same as startswith, but tests whether the string ends with substr. The string module 
contains no equivalent function. 


count(substr[,firstindex[,lastindex]]) 

Counts the number of occurrences of substr within a string. If you pass indices, 
count searches within the slice [firstindex:lastindex]. 

This example gives the answer to an old riddle: “What happens once today, three 
times tomorrow, but never in three hundred years?” 

>>> Riddl eStrings = ["today","tomorrow","three hundred years"] 

>>> for str in RiddleStrings: print str.count("o") 

i’ ' 

3 

0 
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String manipulation methods 

Strings provide various methods to replace substrings, split the string on delim- 
iters, or join a list of strings into a larger string. 


translate(table[,deletestr]) 

Returns a string translated according to the translation string table. If you supply a 
string deletestr, translate removes all characters in that string before applying the 
translation table. The string table must have a length of 256; a character with ASCII 
value n is replaced with t a b 1 e [ n ]. The best way to create such a string is with a 
call to stri ng .maketrans, as described below. 

For example, this line of code converts a string to “Hungarian style,” with words 
capitalized and concatenated. It also swaps exclamatlon points and question marks: 

>>>ProductNattie="power smart report now?" 

>>>ProductNattie. ti tle().translate (stri ng .maketrans ("?!","!?"), stri ng .whi tes pace) 

'PowerSmartReportNow!' 

replace(oldstr,newstr[,maxreplacements]) 

Returns a string with all occurrences of oldstr replaced by newstr. If you provide 
maxreplacements, replace replaces only the first maxreplacements occurrences of 
oldstr. 

>>> "Siamese cat".repi ace("c"b") 

'Siamese bat' 

split([separators[,maxsplits]]) 

Breaks the string on any of the characters in the string separators, and returns a list 
of pieces. The default value of separators is string.whitespace.If you supply a 
value for maxsplits, then split performs up to maxsplits splits, and no more. 

This method is useful for dealing with delimited data: 

>>> StockQuoteLine = "24-Nov-OO.45.9375,46.1875,44.6875,45.1875,3482500,45.1875" 
>>> ClosingPrice=f1oat(StockQuoteLine.split(",") [4]) 

>>> ClosingPrice 
45.1875 

splitlines([keepends]) 

Splits a string on line breaks (carriage return and/or line feed). If you set keepends 
to true, spl i tl i nes retains the terminating character on each Une. The string 
module has no corresponding functlon. For example: 

>>> "The\r\nEnd\n\n".splitlines() 

['The ' , 'End' , ' ' ] 

>>> "The\r\nEnd\n\n".splitlines(l) 

['The\015\012', 'End\012', '\012'] 
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join(StringSequence) 

Returns a string consisting of ali the strings in StringSequence concatenated 
together, using the string as a delimiter. 

This method in generally used in the corresponding function form: 
string.join(StringSequence[, Delimiter]). The default value ol Delimiter \s a 
single space. 

>>> Words=["Ready","Set","Go"] 

>>> join(Words) # weird-looking 

’Ready...Set...Go ' 

>>> string.join(Words ) # equivalent, and more intuitive 

'Ready...Set...Go' 

encode([scheme[,errorhandling]]) 

Returns the same string, encoded in the encoding scheme scheme. The parameter 
scheme defaults to the current encoding scheme. The parameter errorhandling 
defaults to “striet,” indicating that encoding problems should raise a ValueError 
exception. Other values for errorhandling are “ignore” (do not raise any errors), 
and “replace” (replace un-encodable characters with a replacement marker). See 
the section “Encoding Text,” for more information. 


Using the String Module 

Because strings have so many useful methods, it is often not necessary to import 
the string module. But, the string module does provide many useful members. 

Character categories 

The string module includes several constant strings that categorize characters as 
letters, digits, punctuation, and so forth. Avoid editing these strings, as it may break 
Standard routines. 

-f letters—All characters considered to be letters; consists of lowercase + 
uppercase. 

lowercase—All lowercase letters. 

-f uppercase—All uppercase letters. 

-f digits—The string ' 0123456789'. 

♦ hexdigits—The string ' 0123456789abcdefABCDEF’. 
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4 octdigits — The string '01234567'. 

♦ punctuation — String of all the characters considered to be punctuation. 

♦ printable—All the characters that are considered printable. Consists of 
digits, letters, punctuation, and whitespace. 

♦ whitespace — All characters that are considered whitespace. On most Sys¬ 
tems this string includes the characters space, tab, linefeed, return, formfeed, 
and vertical tab. 

Miscellaneous functions 

Most of the functions in the string module correspond to methods of a string 
object, and are covered in the section on string methods. The other functions, 
which have no equivalent object methods, are covered here. 

atoi,atof,atol 

The function string.atoi (str) returns an integer value of str, and raises a 
ValueError if str does not represent an integer. It is equivalent to the built-in 
function i nt(str ). 

The function atof ( str ) converts a string to a float; it is equivalent to the f 1 oat 
function. 

The function atol ( str ) converts a string to a long integer; it is equivalent to the 
1 ong function. 

>>> print string.atof('3.5')+string.atol( ' 2 ' ) 

5.5 

capwords(str) 

Splits a string (on whitespace) into words, capitalizes each word, then joins the 
words together with one space between them: 

>>> string.capwords("The end...or is it?") 

'The End...or Is It? ' 

maketrans(fromstring,tostring) 

Creates a translation table suitable for passing to maketrans (or to regex. compi 1 e). 
The translation table instructs maketrans to translate the nth character in fromstring 
into the nth character in tostring. The strings fromstring and tostring must have the 
same length. 

The translation table is a string of length 256 representing all ASCII characters, but 
with fromstri ng[n] replaced by tostri ng[n]. 
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splitfields,joinfields 

These functions have the same effect as spl i t and j oi n, respectively. (Before 
Version 2.0, spl i tf i el ds and j oi nf i el ds accepted a string of separators, and 
s p 1 i t and j o i n did not.) 


zfill(str,width) 

Given a numeric string str and a desired width width, returns an equivalent numeric 
string padded on the left by zeroes. Similar to r just. For example: 

>>> string.zfi 11("-3",5) 

'-0003' 


Defining Regular Expressions 

A regular expressiori is an object that matches some collection of strings. You can 
use regular expressions to search and transform strings in sophisticated ways. 
Regular expressions use their own special syntax to describe strings to match. 

They can be very efficient, but also very cryptic if taken to extremes. Regular 
expressions are widely used in UNIX world. The module re provides full support for 
Perl-like regular expressions in Python. 

The re module raises the exception re.error ifan error occurs while compiling or 
using a regular expression. 

Prior to Version 1.5, the modules regex and regsub provided support for regular 
expressions. These modules are now deprecated. 

Regular expression syntax 

The definition of a regular expression is a string. In general, a character in the regu¬ 
lar expression’s definition matches a character in a target string. For example, the 
regular expression defined by f red matches the string “fred,” and no others. Some 
characters have special meanings that permit more sophisticated matching. 

A period (dot) matches any character except a newline. For example, 
b. g matches “big,” “bag,” or “bqg,” but not “b\ng.” If the DOTALL flag is 
set, then dot matches any character, including a newline. 

[] Brackets specify a set of characters to match. For example, p [ i e ] n 

matches “pin” or “pen” and nothing else. A set can include ranges: the set 
[aexz] is equivalent to [abcdexyz]. Starting a set with ''means 
“match any character excepf these.” For example, b[''ae]d matches “bid” 
or “b%d,” but not “bad” or “bed.” 

* An asterisk indicates that the preceding regular expression is optional, 

and may occur any number of times. For example, ba*n* matches 
“banana” or “baaaa” or “bn” or simply “b.” 
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+ A plus sign indicates that the preceding regular expressiori must occur at 

least once, and may occur many times. For example, [sweatrd] + 
matches various words, the longest of which is “stewardesses.” The reg¬ 
ular expression [0-9]+/[0-9]+ matches fractlons like “13/64” or “2/3.” 

? A question mark indicates that the preceding regular expression is 

optional, and can occur, at most, once. For example, coi ?d matches 
either “cod” or “cold,” but not “colld.” The question mark has other 
uses, explained below in the sections on “Nongreedy matching” and 
“Extensions.” 

{m,n) The general notation for repetition is two numbers in curly-braces. This 
syntax indicates that the preceding regular expression must appear at 
least m times, but no more than n times. If m is omitted, it defaults to 0. 

If n is omitted, it defaults to infinity. For example, [''a-zA-Z]{3,} 
matches any sequence of at least three non-alphabetic characters. 

A caret matches the beginning of the string. If the MU LTI LIN E flag is set, 
it also matches the beginning of a new line. For example, ''bob matches 
“bobsled” but not “discombobulate.” Note that the caret has an unre- 
lated meaning inside brackets []. 

$ A dollar sign matches the end of the string. If the MU LT ILIN E flag is set, 
it also matches the end of a line. For example, i s $ matches “this” but 
not “fish.” It matches “ThisXnyear” only if the MULTI LI NE flag is set. 

I A vertical slash splits a regular expression into two parts, and matches 

either the first half or the last half. For example, ab | cd matches the 
strings “ab” and “cd.” 

0 Enclosing part of a regular expression in parentheses does not change 
matching behavior. However, Python flags the regular expression 
enclosed in parentheses as a.group. After the first match, you can match 
the group again using backslash notation. Eor instance, the regular 
expression ''[\w]*(\w)\l[\w]*$ matches asingle word with double 
letters, like “pepper” or “narrow” but not “salt” or “wide.” (The syntax 
\w, explained below, matches any letter.) A regular expression can have 
up to 99 groups, which are numbered starting from 1. 

Grouping is useful even if the group is only matched once. Eor example, 
Ste(ph I v)en matches “Stephen” or “Steven.” Without parens, 
Steph|ven matches only the strings “Steph” and “ven.” 

Python also uses parentheses in extensions (see “Extensions” later in 
this chapter). 

\ Escape special characters. You can use a backslash to escape any spe- 
cial characters. For example, ca\$h matches the string “ca$h.” Note that 
without the backslash, c a $ h could never match anything (except in 
MULTI LI NE mode). The backslash also forms character groups, as 
described below. 
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Backslashes and raw strings 

You should generally write the Python string defining a regular expression as a raw 
string. Otherwise, because you must escape backslashes in the regular expression’s 
definition, the excessive backslashes become confusing: 

>>> ThePath = "c: WtempWdownl oadW" 

>>> print ThePath 
c: \tetnp\downl oad\ 

>>> re.search(r"c:Wtemp",ThePath ) # Raw. Reasonably ciear. 
<SRE_Match object at 007CC7A8> 

>>> re.search("c:Wtemp",ThePath) # no match! 

>>> re.search("c:\\Wtemp",ThePath) # Less ciear than raw 
<SRE_Match object at 007ACFD0> 

The second search fails to find a match, because the regular expression defined by 
c : \temp matches only the string consisting of “c:,” then a tab, then “emp”! 

Character groups and other backslash magic 

In addition to escaping special characters, you can also use the backslash in con- 
junction with a letter to match various things. A rule of thumb is that if backslash 
plus a lowercase letter matches something, backslash plus the uppercase letter 
matches the opposite. 

\1, \2, etc. Matches a numbered group. If part of a regular expression is 

enclosed in parentheses, Python flags it as a group. Python num- 
bers groups, starting from 1 and proceeding to 99. You can match 
groups again by number. For example, (. + ) \ 1 matches the 
names of 80’s bands “The The,” “Mister Mister,” and “Duran 
Duran.” 

Python interprets escaped three-digit numbers, or numbers start¬ 
ing with 0, as the octal value of a character. For example, \ 012 
matches a newline. 

Inside set brackets [], Python treats ali escaped numbers as 
characters. 

\A Matches the start of the string: equivalent to 

\b Matches a word boundary. Here “word” means “sequence of 

alphanumeric characters.” For example, snow\b matches “snow 
angel” but not “snowball.” Note that \b in the middle of a word 
indicates backspace, just as it would in an ordinary string. For 
instance, “bozo\b\b\b\bgentleman” matches the string consist¬ 
ing of “bozo,” four backspace characters, then “gentleman.” 

\B Matches a non-word-boundary. For example, \Bne\B matches 

part of “planet,” but not “nest” or “lane.” 

\d Matches a digit: equivalent to [0-9]. 
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\D Matches a non-digit: equivalent to [''0-9]. 

\s Matches a whitespace character: equivalent to [ \t\n\r\f\v]. 

\S Matches a non-whitespace character: equivalent to ['' \t\n\r\f\v]. 

\w Matches an alphanumeric character: equivalent to 

[a-zA-Z0-9_]. If the LOCALE flag is set, \w matches [0-9_] or any 
character defined as alphabetic in the current locale. If the 
UNICODE flag is set, matches [0-9_] or any character marked as 
alphanumeric in the full Unicode character set. 

\W Matches a non-alphanumeric character. 

\Z Matches the end of the string: equivalent to $. 

W Matches a backslash. (Slmilarly, \. matches a period, \? matches 

a question mark, and so forth.) 

Nongreedy matching 

The repetition operators ?,+,* and {m,n} normally match as much as the target 
string as possible. You can modify the operators with a question mark to be “non¬ 
greedy,” and match as little of the target string as possible. For example, when 
matched against the string “over the top,” \b . *\b would normally match the entire 
string. The correspondlng non-greedy version, \ b. * ? \ b, matches only the first 
word, “over.” 


Extensions 

Syntax n of the form (?...) marks a regular expression extensiori. The meaning of 
the extension depends on the character after the question mark. 


(?#...) Is a comment. Python ignores this portion of the regular 

expression. 

(?P<name>...) Creates a named group. Named groups work like numbered 

groups. You can match them again using (? P=natne ). For example, 
this regular expression matches a single word that begins and 
ends with the same letter: ''(?P<letter>\w)\w*(?P=letter)$. 

A named group receives a number, and can be referred to by num- 
ber or by name. 

(?:...) Are non-grouping parentheses. You can use these to enhance read- 

ability; they don’t change the regular expression’s behavior. For 
example, (?:\w+)(\d)\l matches one or more letters followed 
by a repeated diglt, such as “bob99” or “x22.” The string (?: \w+) 
does not create a group, so \ 1 matches the first group, (\d). 


(?i), (?L), Are REs that set the flags re.I, re.L, re.M, re.S, re.U, and re.X 
(?m),(?s), respectively. Note that ( ? L) uses an uppercase letter; the 
(?u),(?x) others are lowercase. 
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(?=...) Is a lookahead assertion. Python matches the enclosed regular 

expression, but does not “consume” any of the target string. For 
example, bl ue( ? = berry ) matches the string “blue,” but onlyif 
it is followed by “berry.” 

(?!...) Is a negative lookahead assertion. The enclosed regular 

expression must not match the target string. For example, 
electron(?!ic\b) matches the string “electron” oniy when it 
is not part of the word “electronic.” 


Creating and Using Regular 
Expression Objects 

The function re.cotnpile(pattern[, flags]) compiles the specified pattern 
string and returns a new regular expression object. The optional parameter flags 
tweak the behavior of the expression. Each flag value has a long name and an 
equivalent short name. 


You can combine flags using bitwise or. For example, this line returns a regular 
expression that searches for two occurrences of the word “the,” ignoring case, with 
any character (including newline) in between. 

re. cotnpi 1 e( "the . the", re . IGNORECASE | re.DOTALE) 


re.IGNORECASE, re.I 
re.LOCALE, re.L 


re.MULTILINE, re.M 

re.DOTALE, re.S 
re.UNICODE, re.U 

re.VERBOSE, re.X 


Performs case-insensitive matching. 

Interprets words according to the current locale. 
This interpretation affects the alphabetic group 
(\w and \W), as well as word boundary behavior 
(\b and \B). 

Makes $ match the end of a line (not just the end 
of the string) and makes '' match the start of any 
line (not just the start of the string). 

Makes a period (dot) match any character, includ¬ 
ing a newline. 

Interprets letters according to the Unicode char¬ 
acter set. This flag affects the behavior of \w, \W, 
\b, \B. 

Permits “cuter” regular expression syntax. It 
ignores whitespace (except inside a set [] or when 
escaped by a backslash), and treats unescaped # 
as a comment marker. For example, the following 
two lines of code are equivalent. They match a sin- 
gle word containing three consecutive pairs of 
doubled letters, such as “zrqqxxyy.” (Finding an 
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English word matching this descriptiori is left as 
an exercise for the reader.) Note that the second 
VERBOSE form of the regular expression is a bit 
more readable. 

NewRE = re.compile(r"''\w*(\w)\l(\w)\2(\w)\3\w*$") 

NewRE = re.compile(r"''\w* (\w)\l (\w)\2 (\w)\3 \w*$#three doubled letters", 
re.VERBOSE) 

Using regular expression objects 

You can use regular expressions to search, replace, split strings, and more. 


search(targetstring[,startindex[,endindex]]) 

The core use of a regular expression! The method search(targetstring) scans 
through targetstring looking for a match. If it finds one, it returns a MatchObject 
instance. If it finds no match, it returns None. (See below for MatchObject meth- 
ods.) The search method searches the slice targetstri ngfstarti ndex: 
endindex] — by default, the whole string. 

The characters $ and match the beginning and ending of the entire string, not nec- 
essarily the start or end of the substring. For example, ''friendsS does not match 
the string “are friends electric?” even if one takes the slice “friends” from index 4 to 
index 11. 


match(targetstring[,startindex[,endindex]]) 

Attempts to match the regular expression against the first part of targetstring. The 
match method is more restrictive than search, as it must match the first zero or 
more characters of targetstring. It returns a MatchObject instance if it finds a match. 
None otherwise. The parameters startindex and endindex function here as they do 

in search. 

findall(targetstring) 

Matches against targetstring and returns a list of nonoverlapping matches. For 
example: 

>>> re.compi 1 e(r"\w+").findal 1 ("the larch") # Greedy matching 
[ ’ the ' , 'larch'] 

>>> re.compi 1 e(r"\w+?").findal 1 ("the larch") # Nongreedy 
['t', 'h', 'e', '1', 'a', 'r', 'c', 'h'] 

If the regular expression contains a group, the list returned is a list of group values 
(in tuple form, if it contains multiple groups). For example: 

>>> re. compi 1 e(r"(\w+)(\w+)").findal1("the larch") 

[('th ' , 'e' ), ('lare', ' h ' ) ] 
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split(targetstring[,maxsplit]) 

Breaks targetstring on each match of the regular expressiori, and returns a list of 
pieces. If the regular expression consists of a single large group, then the list of 
pieces includes the delimiting strings; otherwise, the list of pieces does not include 
the delimiters. If you specify a nonzero value for maxsplit, then spl i t makes, at 
most, maxsplit cuts, and the remainder of the string remains intact. 

For example, this regular expression removes all ifs, ands, and buts from a string: 

>>> MyRE=re.cotnpile(r"\bif\b|\band\b|\bbut\b",re.I) 

>>> LongString="I would if I could, and I wish I could, but I 
can ' t.. 

>>> MyRE.split(LongString) 

['I would ' I could, ' I wish I could, ', " I can't."] 

>>> MyRE=re.cotnpile(r"(\bif\b|\band\b|\bbut\b)",re.I) 

>>> MyRE.split(LongString) # Keep the matches in the list. 

['I would ', 'if, ' I could, ’, 'and', ' I wish I could, ’, 

' but', " I can't."] 

sub(replacement, targetstring[, count]) 

Search for the regular expression in targetstring, and perform a suhstitution at each 
match. The parameter replacement can he a string. It can also be a function that 
takes a MatchObject as an argument, and returns a string. If you specify a nonzero 
value for count, then sub makes, at most, countsubstitutions. 

This example translates a string to “Pig Latin.” (It moves any leading consonant 
cluster to the end of the word, then adds “ay” so that “chair” becomes “airchay”) 

>>> def PigLatinify(thematch): 

>>> ... return thematch.groupC2)+theniatch.group(1)+"ay" 

>>> ... 

>>> WordRE=re.compile(r"\b([b-df-hj-np-tv-z]*)(\w+)\b",re.I) 

>>> WordRE.sub(PigLatinify, "fetch a comfy chair") 

'etchfay aay omfycay airchay' 

If replacement is a string, it can contain references to groups from the regular expres¬ 
sion. For example, sub replaces a\lor\g<l>in replacement wWn the first group 
from the regular expression. You can insert named groups with the syntax \g<name>. 

The sub method replaces empty (length-0) matches only if they are not adjacent to 
another suhstitution. 


subn(replacement, targetstring[, count]) 

Same as sub, but returns a two-tuple whose first element is the new string, and 
whose second element is the number of substitutions made. 
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Applying regular expressions without compiling 

The methods of a regular expression object correspond to functions in the re 
module. If you call these functions directly, you don’t need to call re. compi 1 e in 
your code. However, if you plan to use a regular expression several times, it is more 
efficient to compile and reuse it. The following module functions are available: 


escape(str) 

Returns a copy of s/r with all special characters escaped. This feature is useful for mak- 
ing a regular expression for an arbitrary string. For example, this function searches for 
a substring in a larger string, just like stri ng. f i nd, but case-insensitively: 

def InsensitiveFind(BigString,SubString): 

TheMatch = re . search(re . escapefSubString),BigString,re.I) 
if (TheMatch): 

return TheMatch.start() 
el se: 

return -1 

search(pattem,targetstring[,flags]) 

Compiles pattern into a regular expression object with flags set, then uses it to per- 
form a search against targetstring. 


match(pattern,targetstring[,flags]) 

Compiles pattern into a regular expression object with flags set, then uses it to per- 
form a match against targetstring. 


split(pattern,targetstring[,maxsplit]) 

Compiles pattern into a regular expression object, tben uses it to split targetstring. 


findall(pattern,targetstring) 

Compiles pattern into a regular expression object, then uses it to find all matches in 
targetstring. 


sub(pattern,replacement,targetstring[,count]) 

Compiles pattern into a regular expression object, then calls its sub method with 
parameters replacement, targetstring, and count. The function subn is similar, but 
calls the subn method instead. 


Using Match Objects 

Searchlng with a regular expression object returns a MatchObject, or None if the 
search finds no matches. The match object has several methods, mostly to provide 
details on groups used in the match. 
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group([groupid,...]) 

Returns the substring matched by the specified group. For index 0, it returns the 
substring matched by the entire regular expression. If you specify several group 
identifiers, group returns a tuple of substrings for the corresponding groups. If the 
regular expression includes named groups, groupid can be a string. 

groups([nomatch]) 

Returns a tuple of substrings matched by each group. If a group was not part of 
the match, its corresponding substring is nomatch. The parameter nomatch defaults 
to None. 

groupdict([nomatch]) 

Returns a dictionary. Each entry’s key is a group name, and the value is the sub¬ 
string matched by that named group. If a group was not part of the match, its corre¬ 
sponding value is nomatch, which defaults to None. 

This example creates a regular expression with four named groups. The expression 
parses fractions of the form “1 1/3,” splitting them into integer part, numerator, and 
denominator. Non-fractions are matched by the “plain” group. 

>>> FractionRE=re.compi 1 e( 

... r"(?P<plain>''\d+$)?(?P<int>\d-r(?= ))? 

?(?P<num>\d+(?=/))?/?(?P<den>\d+$)?") 

>>> FractionRE.match("1 1/3").groupdictf) 

{'den': '3', 'num': '1', 'plain': None, 'int': '1'} 

>>> FractionRE.match("42").groupdict("x") 

{'den': 'x', 'num': 'x', 'plain': '42', 'int': 'x') 

start([groupid]), end([groupid]), span([groupid]) 

The methods start and end return the indices of the substring matched by the 
group identified hy groupid. If the specified group didn’t contribute to the match, 
they return -1. 

The method span(groupid) returns both indices in tuple form: 

(start(groupid),end(groupid)). 

By default, groupid is 0, indicating the entire regular expression. 
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re,string,pos,endpos, 

These members hold the parameters passed to search or match: 

4 re — The regular expression object used in the match 
string — The string used in the match 
4 pos—First index of the substring searched against 
4 endpos — Last index of the substring searched against 


Treating Strings as Files 

The module StringlO defines a class named StringlO. This class wraps an in-memory 
string buffer, and supports Standard file operations. Since a StringlO instance does 
not correspond to an actual file, calling its cl ose method simply frees the buffer. 
The StringlO constructor takes, as a single optional parameter, an initial string for 
the buffer. 

The method getvalue returns the contents of the buffer. It is equivalent to calling 

seek( 0 ) and then read(). 

See Chapter 8 for a description of the Standard file operations. 


The module cStringlO defines a similar class, also named StringlO. Because 
cStringlO.StringlO is implemented in C, it is faster than StringlO.StringlO; the one 
drawback is that you cannot subclass it. The module cStringlO defines two addi- 
tional types: InputType is the type for StringlO objects constructed with a string 
parameter, and OutputType is the type for StringlO objects constructed without a 
string parameter. 

The StringlO class is useful for building up long strings without having to do many 
small concatenations. For instance, the function demonstrated in Listing 9-1 builds 
up an HTTP request string, suitable for transmission to a Web server: 



Listing 9-1: httpreq.py 


import re 
import urlparse 
import cStringlO 
import string 
import Socket 


Continued 
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Listing 9-1 (continued) 


STANDARD_HEADERS = .HTTP/1.1 

Accept: image/gif, itnage/x-xbitmap, i mage/jpeg, */* 

Accept-Language: en-us 
Accept-Encoding: gzip, deflate 

User-Agent: Mozilla/4.0 (compatible). 

def CreateHTTPRequest(URL, CookieDict): 

. Create an HTTP request for a given URL (as returned by 

uri parse.uri parse) and a dictionary of cookies (where key 
is the host string, and the value is the cookie in the 

form "param=value". . 

Buffer = cStringlO.StringlO() 

Buffer.writeC"GET ") 

FileString = URL[2] # File name 
if URL[3]!="": # Posted values 

FileString = FileString + -r URL[3] 
if URL[4]!="": # Query parameters 

FileString = FileString + "?" + URL[4] 

FileString = string.repi ace(Fi 1eString," ","%20") 

Buffer.write(Fi 1eString+"\r\n") 
Buffer.write(STANDARD_HEADERS) 

# Add cookies to the request. 

GotCookies=0 

for HostString in CookieDict.keys(): 

# Perform a case insensitive search. (Call re.escape so 
speci al characters 1 i ke . are searched for normally.) 

if (re.search(re.escape(HostString),URL[l],re.I)): 
if (GotCookies==0): 

Buffer.write(" \r\nCooki e: ") 

GotCookies=l 
el se: 

Buffer.write("; ") 

Buffer.write(CookieDict[HostString]) 
if (GotCookies): 

Buffer.writeC"\r\n") 

Buffer.write( "Host: "-rURL[l]) 

Buffer.write("\r\n\r\n") 

RequestString=Buffer.getvalue() 

Buffer.close() 
return RequestString 

if (_name_=="_main_"): 

CookieDict={1 

CookieDict["python"] = "cooki el=val uel" 

CookieDict["python.ORG"] = "cooki e2=val ue2" 

CookieDict["amazon.com"] = "cooki e3=val ue3" 

URL = uriparse.uriparse("http://WWW.python.org/2.0/") 
print CreateHTTPRequest(URL,CookieDict) 
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Encoding Text 

All digital data, including text, is ultimately represented as ones and zeroes. A 
character set is a way of encoding text as binary numbers. For example, the ASCII 
character set represents letters using a number from 0 to 255. The built-in function 
ord returns the number corresponding to an ASCII character; the function chr 
returns the ASCII character corresponding to a number: 

>>> ord('a') 

97 

>>> chr(97) 

' a' 

The ASCII character set has limitations — it does not contain Cyrillic letters, Chinese 
ideograms, et cetera. And so, various character sets have been created to handie 
various collections of characters. The Unicode character set is the mother of ali 
character sets. Unicode subsumes ASCII and Latin-1. It also includes all widely used 
alphabets, symbols of some ancient languages, and everything but the kitchen sink. 

Using Unicode strings 

A Unicode string behaves just like an ordinary string — it has the same methods. 
You can denote a string literal as Unicode by prefixing it with a u. You can denote 
Unicode characters with \u followed by four hexadecimal digits. For example: 

>>> MyUnicodeString=u"Hel 1 0 " 

>>> MyString="Hel 1 o" 

>>> MyUnicodeString==MyString # Legal comparison 
1 

>>> MyUnicodeString=u"\ucafe\ubabe" 

>>> 1 en(MyUnicodeString) 

2 

>>> MyString=" \ucafe\ubabe" # No special Processing! 

>>> 1 en(MyString) 

12 

For a reference on the Unicode character set, and its character categories, see 

http ://WWW .unicode.org/Publi c/UNIDATA/Uni codeData.html. 

Reading and writing non-ASCII strings 

You cannot use Unicode characters with an ordinary file object created by the open 
function: 
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>>> MyUnicodeString=u"\ucafe\ubabe" 

>>> ASCI I Fi 1 e=open("test.txt","w") # This file can't handle 
unicode! 

>>> ASCIIFile.write(MyUnicodeString) 

Traceback (innermost last): 

File "<pyshel1#39>", line 1, in ? 

ASCI I Fi 1 e.write(MyUnicodeString) 

Uni codeError: ASCII encoding error: ordinal not in range(128) 

The codecs module provides file objects to help read and write Unicode text. 


open(filename,mode[,encoding[,errorhandler[,buffering]]]) 

The function codecs . open returns a file object that can handle the character set 
specified by encoding. The encoding parameter is a string specifying the desired 
encoding. The errorhand/er parameter, which defaults to “striet,” specifies what to 
do with errors. The “ignore” handler skips characters not in the character set; the 
“striet” handler raises a ValueError for unacceptable characters. The mode and 
buffering parameters have the same effect as for the built-in function open. 

>>> Bob=codecs.open("test-uni.txt","w","unicode-escape") 

>>> Bob.write(MyUnicodeString) 

>>> Bob.close() 

>>> Bob=codecs.open("test-utfl6.txt","w","utfl6") 

>>> Bob.write(MyUnicodeString) 

>>> Bob.close() 

You should generally read and write files using the same character set, or extreme 
garbling can resuit. The function sys.getdefaultencoding returns the name of 
the current default encoding. 


EncodedFile(fileobject,sourceencoding[,fileencoding[,errorhandler]]) 

The function codecs. EncodedFil e returns a wrapper object for the file fileobject 
to handle character set translation. This function translates data written to the file 
from the sourceencoding character set to the fileencoding character set; data read 
from the file does the reverse. For example, this code writes a file using UTF-8 
encoding, then translates from UTF-8 to escaped Unicode: 

>>> UTFFi 1 e=codecs.open("utf8.txt","w","utf8") 

>>> UTFFi 1 e.write(MyUnicodeString) 

>>> UTFFile.closeO 

>>> MyFi 1e=open("utf8.txt","r") 

>>> Wrapper=codecs.EncodedFi1e(MyFi 1 e,"unicode-escape","utf8") 
>>> Wrapper.read() 

'WuCAFEWuBABE' 
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Using the Unicode database 

The module unicodedata provides functions to check a character’s meaning in the 
Unicode 3.0 character set. 


Categorization 

These functions give information about a character’s general category: 


category(unichr) 


bidirectional(unichr) 


combining(unichr) 


mirrored(unichr) 


Returns a string denoting the category of unichr. 
For example, underscore has category “PC” for 
connector punctuation. 

Returns a string denoting the bidirectional 
category of unichr. For example, Unicode, 
bi di recti onal ( u"e" ) is “L,” indicating that 
“e” is normally written left-to-right. 

Returns an integer indicating the combining class 
of unichr. Returns 0 for non-combining characters. 

Returns 1 if unichr is a mirrored character, 0 
otherwise. 


decomposition(unichr) Returns the character-decomposition string corre- 

sponding to unichr, or a blank string if no decom- 
position exists. 


Numeric characters 

These functions give details about numeric characters: 


decimal(unichr[,default]) Returns unichr's decimal value as an integer. If 

unichr has no decimal value, returns default or (if 
defaultis unspecified) raises aValueError. 

numeric(unichr[,default]) Returns unichfs numeric value as a float. If unichr 

has no decimal value, returns default or (if default 
is unspecified) raises a Val ueError. 


digit(unichr[,default]) Returns unichfs dlgit value as an integer. If unichr 

has no digit value, returns default or (if default is 
unspecified) raises a Val ueError. 
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Formatting Floating Point Numbers 

The fpformat module provides convenience functions for displaying floating point 
numbers. 

fix(number,precision) 

Formats floating point value numher with at least one digit before the decimal point, 
and at most precision digits after. The number is rounded to the specified precision 
as needed. If precision is zero, this function returns a string with the number 
rounded to the nearest integer. The parameter number can be either a float, or a 
string that can be passed to the function float. 

sci(number,precision) 

Formats floating point value number in scientific notation — one digit before the 
decimal point, and the exponent indicated afterwards. The parameters number and 
precision behave as they do for the function fi x. 

Here are some examples of formatting with fpformat: 

>>> fpformat.fix(3.5,0) 

'4' 

>>> fpformat.fix(3.555,2) 

'3.56' 

>>> fpformat.sci(3.555,2) 

'3.56e+000' 

>>> fpformat.sci("0.03555",2) 

'3.56e-002' 

These functions raise the exception fpformat. NotANumber (a subclass of ValueError) 
if the parameter number is not a valid value. The exception argument is the value of 
number. 

Summary 

Python offers a full suite of string-manipulation functions. It also provides regular 
expressions, which enable even more powerful searching and replacing. In this 
chapter you: 

-f Searched, formatted, and modified string objects. 

-f Searched and parsed strings using regular expressions. 

-f Formatted floating point numbers cleanly and easily. 

In the next chapter you’ll learn how Python can handle files and directories. 

■f ♦ -f 


Working with 
Files and 
Directories 


C HiA P TlE R 


J U 


C hapter 8 discussed the basies of file input and output in 
Python, but the routines covered there assume you 
know what file you want to read and write and where lt’s 
located. This chapter explains operating system features that 
Python supports such as finding a list of files that mateh a 
given search pattern, navlgating directories, and renaming 
and copying files. 

This chapter and the next cover many modules, primarily os, 
os . path, and sys. Instead of organizing the chapters around 
the functions provided in each module, weVe tried to group 
them by feature so tbat you can find what you need quickly. For 
example, you can find a file’s size with os . stat (fi 1 ename) 
[stat.ST_SIZE] orwith os.path.getsizeCfilename) 
(somethlng you wouldnT know unless you read through both 
tbe os and os . path modules), so I cover them in the same sec- 
tion. Where this is not possible, IVe added cross-references to 
help guide you. 

Retrieving File and Directory 
Information 

With the exception of a few oddballs, modern operating Sys¬ 
tems let you store files in directories (locations in a named 
hlerarchy or tree) for better organlzation. (Just imagine the 
mess if everything was in one chaotic lump.) This and the 
following sections consider a path to be a directory or file 
name. You can refer to a patb relative to another one 
(. . \terrip\bob . txt means go up the tree a step, down into 
the temp directory to the file called bob . txt) while others are 
absolute (/usr/local/bin/destroystuff telis how to go 
from the top of the tree ali the way down to destroystuf f). 


> ♦ ♦ ♦ 

In This Chapter 

Retrieving file and 
directory informotion 
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dissecting paths 

Listing directories and 
matehing file names 

Obtaining 
environment and 
argument information 

Example: Recursive 
Grep Utility 

Copying, renaming, 
and removing paths 

Creating directories 
and temporary files 

Comparing files and 
directories 

Working with file 
deseri ptors 

Other file processing 
techniques 
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The Secret Identities of os and os.path 


The os module contains plenty of functions for performing operating system-ish stuff like 
changing directories and removing files, while os. path heips extract directory names, file 
names, and extensions from a given path. 

The great thing is that these modules work on any Python-supported platform, making your 
programs much more portable. For example, to join a directory name with a file name, 
using os. path .join makes sure the resuit is correct for the current operating system: 

>>> print os . path . joi n( 'tnaps ' , ' dungeonl2.tnap' ) 
tnaps\dungeonl2.tnap # Resuit when run on Windows 
>>> print os . path . j oi n ('mapsdungeonl2. tnap' ) 
niaps/dungeonl2.tnap # Resuit when run on UNIX 

To make this happen, each platform defines two modules to do the platform-specific work. 
(On Macintosh systems they are tnac and tnacpath; on Windows they're nt and ntpath, 
and so on.) When the os module is imported, it looks inside sys. bui 1 ti n_tnodul e_natnes 
for the name of a platform-specific module (such as nt), loads its contents into the os 
namespace, and then loads the platform-specific path module and renames it to os . path. 

You can check the os.name variable to see which operating system-specific module os 
loaded, but you shouid rarely need to use it The whole point of os and os.path is to make 
your programs blissfully ignorant of the underlying operating system. 


You can choose how you want to access path Information: Python provides several 
functions to retrieve a single bit of information (does this path exist?) or all of it in 
one big glob (give me creation time, last access time, file size, and so forth). 

Note Please note that many of the examples in this chapter use file and directory names 

that may not exist in your system. Accept the examples on faith or substitute valid 
file names of your own (just don't go and erase something important, though). 

The piecemeal approach 

Theaccess(path, mode) function tests to see that the current process has 
permission to read, write, or execute a given path. The mode parameter can be any 
combination of os.R_OK (read permission), os.W_OK (write permission), or 
os.X_OK (execute permission): 

>>> os.access('/usr/1ocalos.R_0K | os.X_0K) 

1 # I have read AND execute permissions... 

>>> os.access('/usr/local',os.W_0K) 

0 # ...but not write permissions. 
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You can also use a mode of os.F_OK to test if the given path exists. Or you can use 

the os . path . exi sts (path ) function: 

>>> os . path .exi sts( ' c : Wwinnt' ) # 'W' to "escape" the slash 
1 

The inverse of access is os . chtnod(path , mode ) which lets you set the mode for 
the given path. The mode parameter is a number created by adding different octal 
values listed in Table 10-1. For example, to give the owner read/write permissions, 
group members read permissions, and others no access to a file: 

os.chmod('secretPlans.txt' ,0640) 

The first few times you use this function you may forget that the values in Table 
10-1 are octal numbers. This is a convention held over from the underlying C 
chmod function; as octais, the different mode values combine in that cute way 
while making the implementation easier. Remember to stick in the leading zero 
on the mode so that Python sees it as an octal, and not a decimal, number. 


Table 10-1 

Values for os.chmod 

Value 

Description 

0400 

Owner can read the path. 

0200 

Owner can write the path. 

0100 

Owner can execute the file or search the directory. 

0040 

Group members can read the path. 

0020 

Group members can write the path. 

0010 

Group members can execute the file or search the directory. 

0004 

Others can read the path. 

0002 

Others can write the path. 

0001 

Others can execute the file or search the directory. 


Different operating systems handie permissions differently (Windows, for 
example, doesn't really manage file permissions with owners and groups). You 
shouid try a few tests out before relying on a particular behavior. Also, consuit the 
UNIX chmod man page for additional mode values that vary by platform. 

The os.path.isabs(path) function returns 1 if the given path is an absolute path. 
On UNIX Systems, a path is absolute if it starts with on Windows, paths are abso¬ 
lute if they either start with a backlash or if they start with a drive letter followed 
by a colon and a backslash: 



Tip 
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>>> os . path . i sabs (' c: Wtemp ' ) 

1 

>>> os . path . i sabs (’ tetnpWfoo' ) 

0 

The following four functions in the os . path module, i sdi r(path ), i sfi 1 e(path), 
isl ink(path ), and ismount(path), test what kind of file System entry the given 
path refers to: 

>>> os . path . i sdi r (' c: Wwi nnt' ) # Is it a di rectory? 

1 

>>> os . path . i sf i 1 e( ' c : Wwi nnt' ) # Is it a normal file? 

0 

>>> os.path.islink('/usr/XllR6/bin/X' ) # Is it a symbolic link? 

1 

>>> os . path . i stnount (' c: W ' ) # It is a mount poi nt? 

I 

On platforms that support symbolic links, i s d i r and i s f i 1 e return true if the path 
is a link to a directory or file, and the os.readlink(path) function returns the 
actual path to which a symbolic link points. 

A mounting point is essentially where two file Systems connect. On UNIX, i smount 
returns true if path and path / . . have a different device or inode. On Windows, 
i stnount returns true for paths like c : \ and WendorX. 

Note An inode is a UNIX file system data structure that holds Information about a direc- 
tory entry. Each directory entry is uniquely identified by a device number and an 
inode number. Some of the following routines may return inode numbers; for UNIX 
machines these are valid, but for other platforms they are just dummy values. 

You can retrieve afile’s size in bytes using os .path . getsize(path): 

>>> os.path.getsize('c:\\winnt\\uninst.exe') 

299520 # About 290K 

The os.path.getatitne(path) and os.path.getmtimefpath) functions return 
the path’s last access and modified times, respectively, in seconds since the epoch 
(you know, New Year’s Eve 1969): 

>>> os.path. getmti me( ' c : Wwi nnt Wreadme . exe ' ) 

786178800 

>>> os . path . getatimef ' c : Wwi nnt Wreadme . exe ' ) 

956901600 
>>> import time 

>>> time. ctime( os . path . getati me (' c: Wwi nnt Wreadme. exe' )) 

'Fri Apr 28 00:00:00 2000' 
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Going the other direction, the os . uti tne ( path , ( ati me , mtime )) function sets the 
time values for the given path. The following example sets the last access and modi- 
fication times of a file to noon on March 1, 1977: 

>>> sec = time.mktime(( 1977,3,1,12,0,0,-1, -1,-1 )) 

>>> os.utime('c:\\temp\\foo.txt',(sec,sec)) 

You can also “touch” a file’s times so that they are set to the current time: 

>>> os . utimeC ' c : WtempWf 00 . txt' , None) # Set to current time. 

Cross- ^ See the time module in Chapter 13 for a discussion of its features and a better 
Referen^ definition of the epoch. 

UNIX-compatibleSystems havethe os .chownCpath , userlD, groupID) that 
changes the ownership of a path to that of a different user and group: 

os.chown('grumpy.png',os.getuid(),os.getgid()) 

Chapter 11 covers functions to get and set group and user IDs. 


Non-Windows Systems include the os . path . samef i 1 e( pathl, path2 ) and os . path . 
sameopenf i 1 e (f 1 , f 2 ) functions that return true if the given paths or file objects 
refer to the same item on disk (they reside on the same device and have the same 
inode). 

The l-want-it-all approach 

If you want to know several pieces of information about a path (for example, you 
need to know a file’s size as well as the time it was last modified), the previous func¬ 
tions are inefficient because each one results in a call to the operating system. The 
os.stat(path) function solves this problem by returning a tuple with ten pieces of 
information all at once (many of the previous section’s functions quietly call os.stat 
behind the scenes and throw away the information you didn’t request): 

>>> os.stat('c:\\winnt\\uninst.exe' ) 

(33279, 0, 2, 1, 0, 0, 299520, 974876400, 860551690, 955920365) 

Don’t worry too much if the numbers returned look useless! The stat module pro¬ 
vides names (listed in Table 10-2) for indexes into the tuple: 

>>> import stat 

>>> os.stat('c:\\winnt\\uninst.exe')[stat.ST_SIZE] # File size 
299520 # Hmm... stili about 290K 
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Table 10-2 

Index Names for os.stat Tuple 

Name 

Descriptiori 

ST_SIZE 

File size (in bytes) 

ST_ATIME 

Time of last access (in seconds since the epoch) 

ST_MTIME 

Time of last modification (in seconds since the epoch) 

ST_M0DE 

Mode (see below for possible values) 

ST_CTIME 

Time of last status change (access, modify, chmod, chown, and so on) 

ST_UID 

Owner's user ID 

ST_GID 

Owner's group ID 

ST_NLINK 

Number of links to the inode 

ST_IN0 

inode's number 

ST_DEV 

inode's device 


Once you have a path’s mode value (stat. ST_M0DE), you can use other stat- 
provided functions to test for certain types of path entries (see Table 10-3 for the 
complete list): 

>>> mode = os.stat('c:Wwinnt’)[stat.ST_M0DE] 

>>> stat.S_ISDIR(mode) # Is it a directory? 


1 

# Yes! 

Table 10-3 

Path Type Test Functions 

Function 

Returns true for 

S_ISREG(mode) 

Regular file 

S_ISDIR(mode) 

Directory 

S_ISLNK(mode) 

Symbolic link 

S_ISFIEO(mode) 

FIFO (named pipe) 

S_ISSOCK(niode) 

Socket 

S_ISBLK(mode) 

SpeciaI block device 

S_ISCHR(mode) 

SpeciaI character device 
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When you call os . stat with a path to a symbolic link, it returns information about 
the path that the link references. The os.lstat(path) function behaves just like 
os . stat except that on symbolic links it returns information about the link itself 
(although the OS stili borrows much of the information from the file it references). 

Cross- A See "Working with File Descriptors" later in this chapter for coverage of the 
Referenc^ OS . fstat function that returns stats for open file descriptors. 

On UNIX-compatible Systems you can call os.satTiestat(statl,stat2) to see if 
two sets of stats refer to the same file (it compares the device and inode number). 

The Python Standard library also comes with the statcache module, which 
behaves just like os . stat but caches the results for later use: 

>>> import statcache 

>>> statcache . State ' c : Wtemp' ) 

(16895, 0, 2, 1, 0, 0, 0, 975999600, 969904112, 969904110) 

Youcancall forget(path) to remove aparticularcachedentry, orreset()to 
remove them all. The f orget_pref i x( pref i x) function removes all entries that 
start with a given prefix, and forget_except_prefix(prefix) removes all that do 
not start with the prefix (removing a cache entry means a call to stat will have to 
check with the operating system again). The f orget_di r (pref i x) function 
removes all entries in a directory, but not in its subdirectories. 


Building and Dissecting Paths 

The different path conventions that operating Systems follow can make path manip- 
ulation a nuisance. Fortunately Python has plenty of routines to help. 

Joining path parts 

The os.path.join(part[, part...]) joins any number of path components into 
a path valid for the current operating system: 

>>> print os . path . join('c:’,'r2d2','c3po','r5d4') 
c: \r2d2\c3po\r5d4 

>>> print os.path.join(os.pardir,os.pardir,'tmp') 

. . \ . . \ttTip 

The separator character used is defined in os . sep. You can use os . curdi r and 
os . pa rdi r with join when you want to refer to the current and parent directories, 
respectively. 
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Breaking paths into pieces 

Given a path, it’s not too hard to separate it into its pieces (file name, extension, 
dlrectory name, and so on) using one of the os.path.split functions: 

>>> os. path . spl i t (r ' c: \terrip\f 00 . txt' ) # Yay, raw strings! 

(’c:Wtemp', 'foo.txt') # Split into path and filename. 

>>> os. path . spl i tdri ve( r' c: \ tetTip\foo . txt' ) 

('c:', ’ WtempWfoo .txt' ) # Split off the drive. 

>>> os.path.splitext(r'c: \temp\foo .txt') 

('c : WtempWfoo ' , '.txt') # Split off the extension. 

>>> os.path.spl itunc(r'Wendor\temp\foo.txt' ) 

('WWendor Wtemp' , 'Wfoo.txt') # Split off machine and mount. 

The spl i tdri ve function is present on UNIX Systems, but for any path just returns 
the tuple ( ' ' , path); the spl i tunc function is available only on Windows. 

The os . path .di rname(path) and os .path . basename(path ) functions areshort- 
hand functions that together return the same Information as spl i t: 

>>> os.path.dirname(r'c: \temp\f oo.txt') 

' c: Wtemp ' 

>>> os.path.basename(r'c:\temp\foo.txt' ) 

'foo.txt' 


Other path modifiers 

The os . path . normcaseC path ) function normalizes the case of a path (makes it all 
lowercase on case-insensitive platforms, leaves it unchanged on others) and 
replaces forward slashes with backwards slashes on Windows platforms: 

>>> print os.path.normcase('kEwL/lAmeR/hAckUr/dOOd' ) 
kewl \lamer\hackur\dOOd 

The os . path . normpath (path ) function normalizes a given path by removing 
redundant separator characters and collapsing references to the parent directory 
(it also fixes forward slashes for Windows Systems): 

>>> print os . path . normpath ( r' c: \workW\temp\ .. \ .. \games ' ) 
c: \games 

The os.path.abspath(path) function normalizes the path and then converts it to 
an absolute path: 

>>> os.getcwd() 

' /export/home' 

>>> os.path.abspathC'fred/backup/../temp/cool .py' ) 

'/ export/home/fred/temp/cool.py' 
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Theos.path.expandvars(path) function searches the given path for variable 
names of the form Svarname and ${varnarrie).If the variahles are defined in the 
environment, expandvars suhstitutes in their values, leaving undefined variahle 
references in place (you can use $$ to print $): 

>>> os.environ.update({'WORK':'work','BAKFILE':'kill .bak')) 

>>> p = os.path.joinC'$W0RK','${BAKFILE)') 

>>> print os.path.expandvars(p) 
work\ki11.bak 

The os.path.expanduser(path) function replaces or “-username" at the 

heginning of a path with the path to the user’s horne directory. For (meaning the 
current user), expanduser uses the value of the FIOME environment variahle if pre- 
sent. On Windows, if FIOME is not defined, then it also tries to find and join 
FIOMEDRI VE and FIOMEPATFI, returning the original path unchanged if it fails. For 
users other than the current user Q'-username”^, Windows always returns the 
original path and UNIX uses the pwd module to locate that user’s horne directory. 

See Chapter 38 to learn more about the pwd module. 



Listing Directories and Matching File Names 

This section lists several ways to retrieve a list of file names, whether they are all the 
files in a particular directory or all the files that match a particular search pattern. 

The os . 1 i stdi r (di r ) function returns a list containing all the files in the given 
directory: 

>>> os . 1 i stdi r( ' c : Wsierra ' ) 

['LAND', 'Half-Life', 'SETUP.EXE'] 

Thed i rcache module provides its own 1 i stdi r function that maintains a cache to 
increase the performance of repeated calls (and uses the modified time on the 
directory to detect when a cache entry needs to be tossed out): 

>>> itnport di rcache 

>>> di rcache.1 istdir('c:\\sierra' ) 

['Half-Life', 'LAND', 'SETUP.EXE'] 

The list returned is a reference, not a copy, so modifying it means your modifications 
are returned to future callers too. The module also has anannotate(head,list) 
function that adds a slash to the end of any entry in the list that is a directory: 

>>> X = dircache.1 istdir('c:Wsierra')[: ] # Make a copy 
>>> dircache.annotate('c:\\si erra',x) 

>>> X 

['Half-Life/', 'LAND/', 'SETUP.EXE'] 
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Use the head parameter to join to each item in the list so that annotate can then 
call os . path . i sdi r. 

The os . path . commonpref i x( 1 i st) function takes a list of paths and returns the 
longest preflx that all Items have in common: 

>>> 1 = [' c : WaxWni ne. txt ’ , ' c: WaxWni nja . txt ’ , ' c: Waxl e ’ ] 

>>> os.path.commonprefix(1 ) 

’ c: Wax' 

The os.path.walk(top,fune,arg) function walks a directory tree starting at top, 
calling fune in each directory. The function fune should take three arguments: arg 
(whatever you passed to arg in the call to wal k), di rname (the name of the current 
directory being visited), and names (a list of directory entries in this directory). 

The following example prints the names of any executable files in the d: \games 
directory or any of its subdirectories: 

>>> def walkfunc(ext,dir,fi 1 es): 

goodFiles = [x for x in files if x.find(ext) != -1] 
if goodFi1 es: 

print dir,goodFi1 es 

>>> os.path.walk('d:Wgames',walkfunc,’.exe') 
d:\games\Flalf-Life ['10051013.exe'] 
d:\games\q3a [’quake3.exe ' ] 
d: \games\q3a\Extras\cs ['sysinfo.exe' ] 

Wlth the f nmatch module you can test to see if a file name matehes a specific pat- 
tern. Asterisks mateh everything, question marks mateh any single character: 

>>> import fnmatch 

>>> fnmatch.fnmatchC'python','p*n') 

1 # It's a mateh! 

>>> fnmatch.fnmatchC'python','pyth?n') 

1 

You can also enclose in square brackets a sequence of characters to mateh: 

>>> fnmatch.fnmatch('python','p[a,e,i,o,u,y,0-9]thon') 

1 # Matehes p + [any vowel or number] + thon 

>>> fnmatch.fnmatchC'p5thon','p[a,e,i,o,u,y,0-9]thon') 

1 

>>> fnmatch.fnmatch('p5thon','p[!0-9]thon') 

0 # Doesn't mateh p + [any char EXCEPT a digit] + thon 

>>> fnmatch.fnmatch('python','p[!0-9]thon' ) 

1 

The fnmatch module also has a fnmatchcase (fi 1 ename , pattern ) function that 
forces a case-sensitlve comparison regardless of whether or not the filesystem is 
case-sensitive. 
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The gl ob module takes the fntnatch module a step further by returning all the 
paths matching a search pattern you provide: 

>>> import glob 

>>> for file in glob.glob('c:\\da*\\?ytrack\\s*.*[y , e]' ): 
print fi 1 e 

c:\dave\pytrack\sdai1y.py 
c:\dave\pytrack\std.py 
c:\dave\pytrack\StkHistinfo.py 
c: \ dave\tnytrack\sdkaccessl . exe 
c: \ da ve\tnytrack\sdkaccess2. exe 

Obtaining Environment and 
Argument Information 

It’s often useful to know a little about the world around Python. This section 
explains how to get and set environment variables, how to discover and change the 
current working directory, and how to read in options from the command line. 


Environment variables 

When you import the os module, it populates a dictionary named envi ron with all 
the environment variables currently in existence. You can use normal dictionary 
access to get and set the variables, and child processes or shell commands your 
programs execute see any changes you make: 

>>> os.environ['SHELL’] 

' /usr/1ocal/bin/tcsh' 

>>> os.environ[' BOO'] = '2 + 2' # Convert value to string. 

>>> print os.popen('echo $B00 ' ) . read() # Use %B00% on Win32. 

4 


See Chapter 11 for Information on child processes and executing shell commands. 

The dictionary used is actually a subclass of UserDi ct, and requires that the value 
you assign be a string. 



Current working directory 

The current working directory is Initially the directory in which you started the 
Python interpreter. You can find out what the current directory is and change to 
another directory using the os .getcwd() and os .chdi r(path) functions: 
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>>> os . chdir('/usr/home') 
>>> os.chdir( ' . . ' ) 

>>> os.getcwd() 

'/usr' 


Command-line parameters 

The sys . argv variable is a list containing the command-line parameters passed to 
the program on startup. Save the tiny program in Listing 10-1 to a file called 
args . py and try the following example from a command prompt: 

C:\terrip>args . py pants beable 
There are 3 arguments 

[' C : WtempWargs . py ' , 'pants', 'beable'] 


Listing 10-1: args.py - Display Command-Line Arguments 


#!/usr/bin/env python 

# Prints out command-line arguments 

import sys 

print 'There are %d arguments' % 1 en(sys.argv) 
print sys.argv 


The sys .argv list always has a length of at least one; as in C, the item at index zero 
is the name of the script that is running. If you’re running the Python interpreter in 
interactive mode, however, that item is present but is the empty string. 


Example: Recursive Grep Utility 

Listing 10-2 combines several of the features covered so far in this chapter to create 
rgrep, a grep-like utility that searches for a string in a list of files in the current 
directory or any subdirectory. The sample output below shows searching for “def” 
in any file that matches the pattern “d*.py” or “h*”: 

D:\Dev\pytrack>\rgrep.py def d*.py h* 

D:\Dev\pytracl<\datai 0 .py 185 def _init_(sel f, sTick): 

D:\Dev\pytrack\dataio.py 189 def getData(self): 
D:\Dev\pytrack\histInfo.py 9 def sum(self,count,tups,index): 
D:\Dev\pytrack\histInfo.py 16 def a ve(self,count,tups,index): 

D:\Dev\pytrack\old\dataio.py 12 def_init_(self,sTick): 

D:\Dev\pytrack\old\dataio.py 16 def getDatafself): 
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Listing 10-2: rgrep.py - Recursive File Search Utility 


#!/usr/bin/env python 

# Recursively searches for a string in a file or list of files. 

import sys, os, fnmatch 

def walkFuncCarg,dir,fi 1 es): 

"Called by os.path.walk to process each dir" 

pattern , tnasks = arg 

# Cycle through each mask on each file. 

for file in files: 
for tnask in tnasks: 

if fnmatch.fnmatch(fi 1 e,mask): 

# Filename matches! 

name = os.path.join(dir,fi 1 e) 
try: 

# Read the file and search. 

data = open(name,'rb').read() 

# Do a quick check. 

if data.find(pattern) != -1: 
i = 0 

data = data.split('\n') 

# Now a lineby-line check. 

for line in data: 
i += 1 

if 1 i ne.find(pattern) != -1: 
print name,i,1 i ne 
except (OSError,lOError): 
pass 

break # Stop checking masks. 

if _name_ == '_main_’: 

if 1 en(sys.argv) < 3: 

print 'Usage: %s pattern file [files...]' % sys.argvCOl 
el se: 
try: 

os.path.walkCos.getcwd(),walkFunc,(sys.argv[l],sys.argv[2:])) 
except Keyboardlnterrupt: 
print '** Flalted **' 


Tip UNIX shells usually expand wiidcards before your program gets them, so when 

running this on UNIX you'd have to enclose in quotes command-line parameters 
that contain asterisks: 

/usr/bin> rgrep.py alligator "*.txt" 
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You can use rgrep as a starting point for a more powerful search tool. For example, 
you could make it accept true regular expressions (as seen in Chapter 9) or make it 
support case-insensitive searches too. Although performance is pretty decent, you 
could fix the fact that rgrep reads the entire file into memory by reading the files 
one piece at a time. 


Copying, Renaming, and Removing Paths 

The routines to copy, rename, and remove paths are in the os and shuti 1 modules. 
The shuti 1 module aims to provide features normally found in command shells. 

Copying and linking 

The shutil.copyfileCsrc, dest) function copies a file from srctodest; 
shutil.copyCsrc, dest) does about the same thing, except that if d e s t is a direc- 
tory it copies the file into that directory Qust like when you copy a file in an MS-DOS 
or UNIX Shell), copy also copies the permission bits of the file. The s h u t i 1 . c o py 2 
(src, dest ) function is identical to copy except that it also copies last access and 
last modification times of the original file, shuti 1 .copyfileobjfsrc, dest[, 
buf 1 en] ) copies two file-like objects, passing the optional bufl en parameter to the 
source objecfs read function. 

See Chapter 8 for more information on filelike objects. 


The shutil.copymode (src, dest) function copies the permission bits of a file 
(see os .chtnod earlier in this chapter), as does shuti 1 .copystat(src, dest), 
which also copies last access and last modification times. 

The shuti 1 . copytree (src, dest[, syml i nks] ) function uses copy2 to recur- 
sively copy an entire tree. copytree raises an exception if dest already exlsts. If 
the sytnl i nks parameter is 1, any symbolic links in the source tree also become 
symbolic links in the new copy of the tree. If syml i nks is omitted or equal to zero, 
the copy of the tree contains copies of the files referenced by symbolic links. 

On platforms that support links, os.symlink(src,dest) creates a symbolic link to 
src and names it dest, and os.link(src,dest) creates a hard link to src named 
dest. 

Renaming 

The os . rename(ol d , new ) function renames a path, and os . renames(ol d, new) 
renames an entire path from one thing to another, creating new directories as 
needed and removing empty ones to cleanup when done. For example: 

os.renames('cache/logs','/usr/home/da ve/backup/0105') 
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basically moves the 1 ogs directory in cache to /usr/hotne/dave/backup and calls 
it 0105. If the cache directory is empty after the move, the function deletes it. 
Before the move, ren ames creates any intermediate directories along the way to 
make /usr/home/dave/backup/0105 a valid path. The ol d and new parameters 
can he individual files and not just entire directories. 


Removing 

The os.remove(filename) function deletes a file, os . rmdi r (di r ) removes an 
empty directory, and os . removedi rs (di r ) removes an empty directory and all 
empty parent directories. 

If a directory is not empty, neither rmdi r nor removedi rs removes it. That joh is 
reserved for shuti1.rmtree(path [ , ignore_errors [, onerror]]), which 
recursively deletes all files in the given directory (including the directory itself) as 
well as any subdirectories and their files, i gnore_errors is 0 by default, if you 
supply a value of 1 then rmt ree attempts to continue Processing despite any errors 
that occur, and won’t bother to teli you about them. You can provide a function in 
the onerror parameter to handle any errors that occur. The function must take 
three arguments, as shown in this example: 

>>> def errFune(raiser,probiemPath,excinfo): 

print raiser._name_,'had problems withprobiemPath 

>>> shuti 1 .rmtree('c:\\temp\\foo',0,errFunc) 
rmdir had problems with c:\temp\foo\bar\yeah 

rmdir had problems with c:\temp\foo\bar 

rmdir had problems with c:\temp\foo 


The arguments passed to your error function are the function object that raised an 
exception, the particular path it had a problem on, and information about the 
exception, equivalent to a call tosys.exc_info(). 



Please be carefui with rmt ree; it assumes you're smart and trusts your judgment. 
If you teli it to erase all your files on your hard drive, it'll obediently do so and with- 
out hesitation. 


Creating Directories and Temporary Files 

The os.mkdir(dir[, mode]) function creates a new directory. The optional mode 
parameter is for the permissions on the new directory, and they follow the form of 
those listed for os . chmod in Table 10-1. (If you don’t supply mode, the directory has 
read, write, and exeeute permissions for everyone.) 

The os.makedirs(dir[, mode]) function creates a new directory and any inter¬ 
mediate directories needed along the way: 

>>> os.makedirs(r'c:\a\b\c\d\e\f\g\h\i ' ) 

>>> os . removedi rs(r'c:\a\b\c\d\e\f\g\h\i ' ) 
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Even though my computer didn’t have an a directory or an a \ b directory, and so 
on, makedi rs took care of creating them until at last it created i, a subdirectory of 
h (and then I used os . removedi rs to clean up the mess). 

The tempf i 1 e module helps when you need to use a file as a temporary storage 
area for data. In such cases you don’t generally care about a file name or where the 
file lives on disk, so tempf i 1 e takes care of that for you. Temporary files can help 
conserve memory by storing temporary Information on disk instead of keeping it all 
loaded in memory. 

The tempfi 1 e .mktempC [suffix]) function returns the absolute path to a unique 
temporary file name that does not exist at the time of the call, and includes the suffix 
in the file name if you supply it. Although two calls to mktemp won’t return the same 
file name, it doesnT create the file, so lt’s possible (although quite unllkely) that if 
you wait long enough someone else may create a file by tbe same name. Wbile it’s 
safe to use the file name as soon as you get it, it isn’t as safe to save a copy of the 
name and then at a later date expect to create a file by tbat name, for example. 

You can set tbe tempfi 1 e . tempdi r variable to teli mktemp wbere to store tempo¬ 
rary files. By default, it tries its best to find a good horne for them, first checking the 
values of the environment variables $TMPDI R, $TEMP, and $TMP. If none of them are 
defined, it then checks if it can create temporary files in known temporary file 
safe-havens such as /var/temp, /usr/tmp, or /tmp on UNIX and c:\temp or \temp 
on Windows. If all these fail, it’ll try to use the current working directory. 
tempfi 1 e. gettemppref i x() returns the prefix of the temporary files you 
have (you can set this value via tempfi 1 e. templ ate). 

The ultimate in hassle-free temporary files comes from the tempf i 1 e. 

Tempora ry Fi 1 e class. It glves you a file or filelike object that you can read and 
write to without worrying about cleanup when you’re done. You use 
tempfile.TemporaryFile([mode[, bufsize[, suffix]]])to create a new 
instance object. The following example figures out how many digits it takes to write 
out the numbers from 1 to h i g h. (Of tbe many better ways to do tbis, the simplest 
improvement is simply to add the length of each number to a counter instead of 
building tbe entire string and taking its length, but that wouldnT give me an oppor- 
tunity to use Temporary Fi 1 e now would it?): 

>>> def digitCountfhigh): 
import tempfi1 e 
f = tempfi 1 e.TemporaryFi 1e() 
for i in range(1,high+1): 

f. write('i') 
f.flushO 
f. seek(0) 

return 1 en(f.read()) 

>>> digitCountf12) 

15 # len('123456789101112') = 15 
>>> digitCount(100) 

192 

>>> digitCountf100000) 

488895 
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By default, tnode is ‘w+b’ so you can read and write data and not worry about the 
type of data you’re writing (binary or text). The optional buf s i ze argument gets 
passed to the open function, and the optional suffi x argument is passed to 
mktemp. On UNIX Systems, the file doesn’t even have a dlrectory entry, making it 
more secure. Other Systems delete the temporary file as soon as you call cl ose or 
when Python garbage collects the object. 

On UNIX Systems, the os module has three functions for working with temporary 
files, os . tmpf i 1 e () creates a new file object that you can read and write to. As 
with tempf i le’s Temporary File class, the file has no directory entry and ceases 
to exist when you close the file. 

The os . tmpnamf ) function returns an absolute path to a unique file name suitable 
for use as a temporary file (it doesn’t create an actual file), os . tempnamf [di r, 

[ p r e f i X ] ]) does the same as t m p n a m except that it enables you to specify the 
dlrectory in which the file name will live, as well as supplies an optional prefix to 
use in the temporary file’s name. 


Comparing Files and Directories 

The fi 1 ecmp module aids in comparing files and directories. To compare two files, 
call filecmp.cmp(fl,f2[,shallow[,use_statcache]]): 

>>> import fi 1ecmp 

>>> open('one','wt').write( 'Hey' ) 

>>> open(’two','wt').write( ' Hey ' ) 

>>> f i 1 ecmp.cmp('one','two' ) 

1 # Files match 

The s ha 11 ow parameter defaults to 1, which means that if both are regular files 
with the same slze and modification time, the comparison returns true. If they 
differ (or if shal 1 ow=0), the function compares the contents of the two. The 
use_statcache parameter defaults to 0 and cmp calls os .stat for file info. If 1, cmp 
calls statcache.stat. 

The filecmp.cmpfilesCa, b, commonf, shal1ow[, use_statcache]])function 
takes a list of file names located in two directories (each file is in both directory a 
and b) and returns a three-tuple containing a list of files that compared equal, a list 
of those that were different, and a list of files that weren’t regular files and therefore 
weren’t compared. The sha 11 ow and use_statcache parameters behave the same 
as for cmp. 

The di rcmp class in the fi 1 ecmp module can help you generate that list of common 
files, as well as do some other comparison work for you. You use fi 1 ecmp . 
dircmpfa, b[, ignoref, hide] ]) to create a new instance: 
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Tip 


>>> d = f i 1 ecmp . di rctnp( ' c: WProgram Fi 1 es ' , ' d : WProgram Files') 
>>> d.reportC) 

diff c:\Progratn Files d:\Progratn Files 

Only in c:\Prograrri Files : [' Accessori es' , 'Adobe', ...<snip> 
Only in d:\Prograrri Files : ['AnalogX', 'Paint Shop Pro...<snip> 
Common subdi rectori es : ['WinZip', ' Yahoo !','work'] 

The i gnore function is a list of file names to ignore (it defaults to [‘RCS’, ‘CVS’, ‘tags’]) 
and h i de is a list of file names not to show in the listings (it defaults to [os.curdir, 
os.pardir], the entries corresponding to the current and parent directories). 

The di rcmp. report () method prints to Standard output a comparison hetween a 
and b. di rcmp. report_parti al_cl osure() does the same, hut also compares 
common immediate subdirectories. di rcmp.repor t_full_closure() goes the 
whole nine yards and compares all common subdirectories, no matter how deep. 

After you create a di rcmp object, you can access any of the attributes listed in 
Table 10-4 for more information about the comparison. 


Table 10-4 

Other dircmp Object Attributes 

Attribute 

Description 

1eft_li st 

Items in a after being filtered through hi de and i gnore 

right_li st 

Items in b after being filtered through hi de and i gnore 

common 

Items in both a and b 

1eft_only 

Items only in a 

right_only 

Items only in b 

common_dirs 

Subdirectories found in both a and b 

common_fi 1 es 

Files found in both a and b 

common_funny 

Items found in both a and b, but either the type differs 
between a and b or os . stat reports an error for that item 

same_fi 1 es 

Common_f i 1 es that are identical 

diff_fi 1 es 

Common_f i 1 es that are different 

funny_fi 1 es 

Common_f i 1 es that couldn't be compared 

subdirs 

Dictionary of di rcmp objects — keys are common_di rs 


The Python distribution comes with ndiff (Tools/Scripts/ndiff.py), a utility that pro- 
vides the detaiis of what differs between two files (similar to the UNIX diff and 
Windows windiff Utilities). 
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Working with File Descriptors 

An alternative to using Python’s file objects is to use file descriptors, a somewhat 
lower level approach to working with files. 

General file descriptor functions 

You create a file descriptor with the os.openCfile, flags[, mode]) function. You 
can comblne various values from the next table, Table 10-5, for the f 1 ags parame- 
ter, and the mode values are those you pass to os.chmod: 

>>> fd = os.open('fumble.txt',os.0_WR0NLYI os.0_CREAT) 

>>> os.write(fd,'I 1 ike fudge') 

12 # Wrote 12 bytes. 

>>> os.close(fd) 

>>> open('fumble.txt').read() # Use the nice Python way. 

'I 1 ike fudge' 

The os . dup (fd) function returns a duplicate of the glven descriptor, and 

os . dup2 (fdl, fd2) makes fd2 a duplicate of fdl, but closes fd2 first if necessary. 

Given a file descriptor, you can use os.fdopen(fd[, mode[, bufsize]])to create 
an open Python file object connected to the same file. The optional mode and 
bufsi ze arguments are the same as those used for the normal Python open function. 


Table 10-5 

File Descriptor Open Flags 

Name 

Description 

0_RD0NLY 

Allow reading oniy 

0_WR0NLY 

Allow writing onIy 

0_RDWR 

Allow reading and writing 

0_BINARY 

Open in binary mode 

0_TEXT 

Open in text mode 

0_CREAT 

Create file if it does not exist 

0_EXCL 

Return error if create and file exists 

0_TRUNC 

Truncate file size to 0 

0_APPEND 

Append to the end of the file on each write 

0_N0NBL0CK 

Do not block 
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The os module also has other flags such as 0_DSYNC, 0_RSYNC, 0_SYNC, and 
0_N0CTTY. Their behavior varies by platform so you should consuit the UNIX open 
man page for your system for details. 

j- Cross- ^ The os . openpty function returns two file descriptors for a new pseudo-terminal. 
Referen^ See Chapter 38 for details. 

The following os file descriptor functions closely mirror their file method counter- 
parts covered mostly in Chapter 8, “Input and Output”: 

close(fd) isatty(fd) 1seek(fd,pos,how) read(fd,n) 
write(str) fstat(fd) ftruncateffd,1 en) 


UNIX Systems can use the os.ttyname(fd) to retrieve the name of the terminal 
device the file descriptor represents (if it is a terminal): 

>>> os . ttynamed) #1 is stdout 
’ /dev/ttyvl' 


Pipes 

A pipe is a Communications mechanism through which you can read or write data 
as if it were a file. You use os . pi pe () to create two file descriptors connected via 
a pipe: 

>>> r,w = os.pipef) # One for reading, one for writing 

>>> os.write(r,'Pipe dream') 

>>> os.write(w,'Pipe dream') 

10 

>>> os.readCr, 1000) 

'Pipe dream' 


On UNIX, the os.mkfifo(path[, mode]) function creates a named pipe (FIFO) that 
you can use to communicate between processes. The mode defaults to read and 
write permissions for everyone (0666). After you create the FIFO on disk, you open 
it and read or write to it just like any other file. 


Other File Processing Techniques 

The modules below provide alternative methods for operating on file contents. 

Randomly accessing lines in text files 

The 1 i necache module returns to you any line in any file you want: 


>>> import linecache 
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>>> linecache.getline('linecache.py’ ,5) 

'that natne.\012' 

The first time you request a line from a particular file, it reads the file and caches 
the lines, but future calls for lines from the same file won’t have to go back to the 
dlsk. Line numbers are 1-based (yes, line one is line one). 

If keeping too many files around makes you nervous, you can call 1 i necache . 
clearcache( ) to empty the cache. Also, calling 1 i necache.checkcacheC) tosses 
out cached entries that are no longer valid. 

Note This module was designed to read lines from modules (Python uses it to print 
traceback information in exceptions), so if 1 i necache can't find the file you 
named it also searches for the file in the module search path. 

Using memory-mapped files 

A memory-mapped file (in the mmap module) behaves like some sort of file-mutable 
string hybrid. You can access individual cbaracters and slices as well as change 
them, and you can use memory-mapped files with many routines that expect strings. 
(The re module, for example, is quite happy to do regular expression searching and 
mapping on a memory-mapped file.) They also work well for routines that operate 
on files, and you can commit to disk any changes you make to their contents. 

When you create a new mmap object, you supply a file descriptor to a file opened for 
reading and wrlting and a length parameter specifying the number of bytes from the 
file the memory map wlll use: 


>>> 

f = open('mymap','w+b') 



>>> 

f. write('And 

now for 

something 

completely di 

fferent' ) 

>>> 

f .flushC) 





>>> 

import mmap 





>>> 

m = mmap.mmap 

(f.file 

no(),45) # 

Use the open 

file mymap 

>>> 

m[5:10] # It 

slices 

. 



’ ow 

f 0 ' 





>>> 

m[5:10] = 'ew 

fi' # 

It dices. 



>>> 

m[5:10] 





' ew 

fi ' 





>>> 

1 

m.f1ush(); m. 

close() 

# But wait 

, there's more! 

1 

>>> 

open('mymap' ) 

.read() 





'And new fir something completely different\000\000\000 ' 

The Windows version for creating a new mmap object accepts an optional third argu- 
ment of a string that represents the tag name for the mapping (Windows lets you 
have many mappings for the same file). If you use a mapping that doesnT exist, 
Python creates a new one; otherwise the mapping by that name is opened. 
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The UNIXversion optionally takes fl ags and prot arguments. fl ags can be either 
MAP_PRI VATE or MAP_SHARED (the default), signifying that changes are visible only 
to the current process or are visible to all processes mapping the same portions of 
the file. The prot argument is the logical OR of arguments specifying the type of 
protection that mapping has, such as PR0T_READ I PROT_WRITE (the default). 

Tip Avoid using the optional flags if possible so that your code will work on Windows 

A or UNIX. 

You can use mmap . si ze() to retrieve the size of a mmap object, and 
mmap. resi ze (newsi ze) to change it: 

>>> m.size() 

50 

>>> m.resizef100) 

Call mmap.flushCCoffset, size])to save changes to disk. Passing no arguments 
flushes all changes to disk, otherwise the memory map flushes only size bytes 
starting at of fset. 

Caution Don't forget to flush. If you don't call flush, you have no guarantee that your 
changes will make it to disk. 

All mmap objects have the close(),tell(),seek(),read(num),write(str), 
readl i ne(), and find(str[, start]) methods which behave just llke their file 
and string counterparts. The mmap . read_byte () and mmap . wri te_byte( byte) 
methods are useful for reading and writing one byte at a time (the bytes are passed 
and returned as strings of length 1). You can copy data from one location to another 
within the memory-mapped file using mmap.movefdest, src, count).lt copies 
count bytes from src to dest. 

Iterating over several files 

The f i 1 ei nput class lets you iterate over several files as if they were a single file, 
ellminatlng a lot of the housekeeping involved. Its designed use is for iterating all 
files passed in on the command line, Processing each line individually: 

>>> import fi 1 einput 
>>> for line in fi 1 einput.input(): 
print 1 i ne 

The above example iterates over the files listed in sys.argv[l:] and prints out each line. 
The input(files,inplace,backup) function uses the command-line arguments if 
you don’t pass it a fi 1 es list. Any file (or command-line argument) that is just reads 
from stdi n instead. If the i npl ace parameter is 1, fi 1 ei nput copies each file to a 
backup and routes any output on stdout to the original file, thus enabling in-place 
modification or filtering of each file. If i npl ace is 1 and you supply a value for backup 
(in the form of ‘. ext’), fileinput uses backup’s value as an extension when creating 
backups of the original files, and it doesn’t erase the backups when finished. 
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While iterating over the files, you can call fi 1 ei nput. fi 1 enatne () to get the name of 
the current file, and f i 1 enatne. i sstdi n () to test if the current file is actually stdi n. 

The f i 1 ei nput. 1 i neno( ) function gives you the overall line number of the line 
just read, and f i 1 ei nput. f i 1 el i neno () returns the number of that line within the 
current file. You can also call f i 1 ei nput. i sf i rstl i ne () to see if it is the first line 
of that file. 

The fileinput.nextfilef ) function skips the rest of the current file and moves 
to the next one in the sequence, and f i 1 e i n p u t. c 1 o s e () closes the sequence and 
quits. 


Tip You can customize the fi 1 ei nput functionality by subclassing the fi 1 ei nput. 

^ Fi 1 elnput class. 

Summary 

Python gives you a full toolbox of high-level functions to manipulate files and paths. 
In this chapter you learned to: 

-f Manipulate paths and retrieve file and directory Information. 

-f Traverse directory trees and match file names to search patterns. 

-f Create and destroy directories and temporary files. 

Use file descriptors. 

The next chapter covers more of Python’s operating system features. You’ll learn to 
access process Information, start child processes, and run shell commands. 

■f -f 



Using Other 
Operating 
System Services 

T his chapter finishes coverage of Python’s main operating 
System Services. One of the main points of focus is work- 
ing outside the boundaries in which the interpreter is running. 
After you’re done with this chapter you’ll be able to execute 
commands in a sub-shell or spawn off an entirely new process. 

Executing Shell Commands and 
Other Programs 

The simplest way to execute a shell command is with the 
os . System (ctnd) function (which is just a wrapper for the C 
System function). The following example uses the shell com¬ 
mand e c h 0 to write contents to a file, including an environ- 
ment variable set from within the Python interpreter: 

>>> import os 

>>> os.environ['GRUB'] = 'spam!' 

>>> os . systemf'echo Mmm, %GRUB% > mm.txt') # 

Use $GRUB on UNIX 

0 

>>> print open('mm.txt').read() 

Mmm, spam! 

The return values vary by System and command, but 0 gener- 
ally means the command executed successfully. 

Unfortunately, os . System has some limitations. On Windows, 
your command runs in a separate MS-DOS window that rears 
its ugly head until the command is done, and on ali operating 
Systems it’s kind of a paln to retrieve the output from the com¬ 
mand (especially if the output is on both st do ut and 
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Tip 


stderr). The next section shows how to get around this using the much cleaner 
calls to os.popen and friends. 

Windows Systems can useos.startfile(path) to launch a program by sending a 
file to the program assoclated with Its file type. For example, if the current direc- 
tory has a file called yoddl e . html, you can launch a Web browser to view that file 
like this: 

>>> os.startfi 1e('yoddle.html ' ) 

The os . exec family of functions executes another program, but in doing so 
replaces the current process — your program doesn’t continue when the exec func- 
tion returns. Instead, your program terminates and at the same time launches a dif¬ 
ferent program. Each of the exec functions comes in two versions: one that accepts 
a variable number of arguments and one that takes all the program’s arguments in a 
list or tuple. All arguments are strings, and you always need to provide argument 0, 
which is just the name of the program being executed. 

The os.execv(path,args) and os.execHpath, argO, argl, ...) functions 
execute the program pointed to by path and pass it the arguments. The following 
example shuts down the Python interpreter and launches the Windows calculator 
(the location of the ca 1 c program may vary): 

>>> os . execv('c:Wwinnt\\system32\\calc',['calc']) 

The os . execvp(file, args) and os.execl p(file, argO, argl, ...) functions 
work the same as execv, except they look in the PATH envlronment variable to find 
the executable, so you don’t have to name its absolute path. This example calls 
another Python interpreter, telling it to just print out a message. Note the use of the 
variable-argument form (execl p) and that you stili have to list the program twice, 
once for the file argument, and once as argument 0: 

>>> os.execlp(’pythonpython-cprint \’Goodbye!\'"') 

If you need to modify the PATH environment variable, you can use os . defpath 
to see the default PATH used if it isn't set in the environment. os . pathsep is the 
separator character used between each directory listed in the PATH variable. 

The os .execve(path, args, env) and os.execle(path, argO, argl.env) 

functions are also like execv, except that you pass in a dictionary containlng all the 
environment variables to be defined for the new program. The dictionary should 
contain string keys mapping to string values. 

The final exec functions, os.execvpe(file, args, env) and os.execlpe(file, 

argO, argl.env), are like execve and execvp combined. You pass in a file 

name instead of an absolute path because the functions search through the path for 
you, and you also pass in a dictionary of environment variables to use. 
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/Note You don't really have to name the program twice for the exec calls. When supply- 
' ing a value for argument 0, you can actually use any value you want. Be advised, 

however, that some programs (like gzip and gunzip) may expect argument 0 to 
have certain values. 


Spawning Child Processes 

Depending on your needs, you can start child processes using the popen, spawn, 
and fork functions. 


popen functions 

The popen family of functions opens pipes to communicate with a child process. 


The os.popen(ctnd[, tnode[, bufsize]]) function opens a single pipe to read or 
write to another process. You pass in the command to execute in the cmd parame- 
ter, followed by an optional mode parameter to teli whether you’ll be reading (‘r’) 
or writing (‘w’) with the pipe. An optional third parameter is a buffer size like the 
one used in the built-in open function. popen returns a file object ready for use: 


>>> a = os.popen('dir /w /ad e:\\') # Mode defaults to 'r'. 
>>> print a.read() 

Volutne in drive E has no label. 

Volume Serial Number is 2C40-1AF5 


Directory of e : \ 


[RACER] 

[FIaskMPEG] 
[VNC] 

[AnalogX] 


[tnaxdev] 
[Diablo II] 
[dxsdk] 
[Python20] 


[VideoDub] 
[archive] 

[VMware] 


The cl ose () method of the file object returns None if the command was successful, 
or an error code if the command was unsuccessful. 

The os.popen2(cnid[, bufsize[, mode]]) function is a more flexible alternative 
to popen; it returns to you the two-tuple (stdi n, stdout) containing the Standard 
input and output of the child process (the mode parameter is ‘t’ for text or ‘b’ for 
binary). The following example uses the external program grep to look through 
lines of text and print any that have a colon character in them: 

>>> someText = . 

. . . def printEvents (): 

for i in rangeC100): 
if i % 2 == 0: 
print i 
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>>> w,r = os.popen2('grep # Grep for lines with 

>>> w.write(someText) 

>>> w.close() 

>>> print r.read() 
def printEvents(): 
for i in range(lOO): 
if i % 2 == 0: 


Tip Depending on the program you execute, you often need to flush or even close 

^ stdi n of the child process in order to have it produce its output. 

The os.popen3(ctnd[, bufsize[, mode]]) function does the same work as 
p 0 p e n 2 but instead returns the three-tuple (stdin, stdout, stderr)of the child 
process. os.popen4(cmd[, bufsize[, mode]]) does the same except that it 
returns the output of stdout and stderr together in a slngle stream for conve- 
nience. This function is a great way to execute arbitrary shell commands cleanly 
because you have to look in only one place for the output, and no matter what the 
command is, your users won’t see error output sneaking past you and onto the 
screen. And on Windows Systems, you don’t get the ugly MS-DOS window while 
your command executes: 


>>> w,r = os.popen4('iblahblahasdfasdfr *.foo') 

>>> print r . read() 

' iblahblahasdfasdfr' is not recognized as an internal or 
external command, operable program or batch file. 



The popen2, popen3, and popen4 functions were new in Python 2.0. 


spawn functions 

The spawn functions start a child process that doesnT replace the current process 
(like the exec functions do) unless specifically asked to. For example, to start up 
another Python interpreter (assuming it lives in D : \ Python20) without stopping 
the current one: 

>>> os.spawnHos. P_N0WAIT, 'd:\\python20\\ python' , ' python ' ) 

400 # Process ID of new interpreter 

Like the exec functions, the spawn functions have many variatlons, as shown in the 
following paragraphs. 

os.spawnv(mode, path, args) and os.spawnlCmode, path, argO, argl, ...) 
start a new child process. 

os.spawnve(mode, path, args, env) and os.spawnle(mode, path, argO, 

argl.env) start a child process using the environment variables contained 

in the dictionary env. 
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On UNIX Systems, variants of each of the above functions search the current path 
for the program to execute, and are named spawnl p, spawnl pe, spawnvp, and 
spawnvpe. 

The arguments passed in should include the program name for argument 0. A mode 
of os . P_WAIT forces the current thread to wait until the child process ends. 
os . P_N0WAIT runs the child process concurrently, and os . P_0VERLAY terminates 
the calling process before running the child process (making it identical to the exec 
functions). os . P_DETACH also runs the process concurrently, but in the background 
where it has no access to the console or the keyboard. 

When you start a child process concurrently, the spawn function returns the pro¬ 
cess ID of the child process. If you use os . P_WAIT instead, the function returns the 
exit code of the child once the child process finally quits. 

fork 

The os . fork( ) function (available on UNIX Systems) creates a new process that is 
a duplicate of the current process. To distinguish between the two processes, 
os . fork() returns 0 in the child process, and in the parent process it returns the 
process ID of the child: 

>>> def forkFuncf): 

pid = os.fork() 
if pid == 0: 

print 'I am the chi1d ! ' 
os._exit(0) 
el se: 

print 'I am the parent. Child PID is',pid 
>>> forkFuncf) 

I am the parent. Child PID is 1844 
I am the child! 

Notice that the child process can force itself to terminate by calling 
os._exit(status), which terminates a process without the usual cleaning up 
(which is good because the parent and child processes access some of the same 
resources, such as open file descriptors). 

Cross- ^ Chapter 38 has information on the pty (pseudo-terminal) module, its fork and 
Ref erence'\ spawn functions, and the os . f orkpty function. 

Process management and termination 

When you call os ._exi t () to end a process, Python skips the normal cleanup opera- 
tions. The normal way to end the current process is by calling sy s . exi t([status]). 
The status parameter can be a numerical status code that Python returns to the par¬ 
ent process (which by convention is 0 for success and nonzero for an error), or any 
other object. For non-numeric objects, sys . exi t prints the object to stderr and then 
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exits with a status code of 1, making it a useful way for programs to exit when users 
supply invalid command-line arguments: 

>>> import sys 

>>> sys.exit('Usage: zapper [-force]') 

Usage: zapper [-force] 

C:\> 


Other ways to shut down 

Another way to terminate the current process is by raising the Sy stemExi t excep- 
tion (which is what sys . exi t does anyway). You can cause the process to termi¬ 
nate abnormally by calling o s . a b o r t (), causing it to receive a SIGA B RT signal. 

The a texi t module provides a way for you to register cleanup functions for Python 
to call when the interpreter is shutting down normally. You can register multiple 
functions, and Python calls them in the reverse order of how you registered them. 
Use atexit.registerCfunc [, args]) to register each function, where args are 
any arguments (normal or keyword) that you want sent to the function: 


>>> 

import 

atexit 



>>> 

def bye 

(msg): 




pri nt 

msg 



>>> 

def ali 

Done(*args): 




pri nt 

'Here are my 

args 

: ' ,args 

>>> 

atexit. 

regi ster(bye," 

I ’ m 

meiting!") 

>>> 

atexit. 

regi ster(al1 Do 

ne, 1 

,2,3) 

>>> 

raise SystemExit # Sh 

ut down. 

Her( 

5 are my 

args: (1, 2 , 

3) 


I' m 

meiting 

1 




New i\ The atexi t module was new in Python 2.0. 

Feature 

Waiting around 

On UNIX Systems, you can call os .wai t( [opti on] ) to wait for any child process to 
stop or terminate, or os . wai tpi d (pi d , opti on ) to wait for a particular child pro¬ 
cess. The values available to use for the option parameter vary by System, but you 
can always use os . WNOHANG to teli wa i t to return immediately if no processes have 
a termination to report, or 0 to wait. The wait functions return a two-tuple 
(p id,status), and you can decipher the status using any of the os functions listed 
in Table 11-1. The following example forks off a child process that sleeps for five 
seconds and then exits. The parent waits until the child finishes and then prints the 
exit information for the child: 
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>>> import os,time 
>>> def useless(): 
z = os.forki) 
if z == 0 : 

for i in range(5): 

time.sleep(1) 
os._exit(5) 
el se: 

print ’Waiting on ’ ,z 
status = os.waitpid(z , 0) [1] 

print 'Exited normal1yos.WIFEXITED(status) 
print 'Exit codeos.WEXITSTATUS(status) 

>>> uselessC) 

Waiting on 1915 
Exited normally: 1 
Exit code: 5 


Table 11-1 

Walt Status Interpretation Functions 

Function 

Value returned 

WIFSTOPPED(status) 

1 if process was stopped (and not terminated) 

WSTOPSIG(status) 

Signal that stopped the process if WI FSTOPPED was true 

WIFSIGNALED(status) 

1 if process was terminated due to a signal 

WTERMSIG(status) 

Signal that terminated the process if WI ESIGNALED was true 

WIEEXITED(status) 

1 if the process exited due to _exi t () or exi t () 

WEXITSTATUS(status) 

Status code if WI E E XIT E D was true 


j- Cross- \ Instead of spawning off separate processes to do your bidding, you may just need 
Referen^ to use threads. Chapter 26 covers multithreaded Python programs. 


Handiing Process Information 

Table 11-2 lists the plethora of functions in the os module for getting and setting 
Information about the current process. Except where noted, the functions are 
available only on UNIX. 
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Table 11-2 

Process Information Functions in os 

Functions 

Description 

getpicK ) 

Gets the current process ID (Windows and UNIX). 

getppid() 

Gets the parent process ID. 

getegid() / setegid(id) 

Gets/sets effective group ID. 

getgid() / setgid(id) 

Gets/sets group ID. 

getuid() / setuid(id) 

Gets/sets user ID. 

geteuidC) / seteuid(id) 

Gets/sets effective user ID. 

getprgrp() /setprgrp() 

Gets/sets process group ID. 

ctermid() 

Gets the file name of the controlling terminat 

getgroupsC) 

Gets list of group IDs for this process. 

getlogin() 

Gets actual login name for current process. 

setpgid(pid , pgrp) 

Sets the process group for process pi d (or the current 
process if pi d is 0). 

setreuid(ruid , eui d) 

Sets real and effective user IDs for the current process. 

setregid(rgid , egi d) 

Sets real and effective group IDs for the current process. 

tcgetprgrp(fd) 

Gets the process group ID associated with f d (an open 
file descriptor of a terminal device). 

tcsetpgrpCfd, pg) 

Sets the process group ID associated with f d (an open 
file descriptor of a terminal device). 

setsid() 

Creates a new session/process group and returns the 
process group ID. The calling process is the group 
leader of the new process group. 

umask(mask) 

Sets the process's file mode creation mask and returns 
the previous mask (Windows and UNIX). 

N i c e (i n c) 

Adds i nc to the process's nice value. The more you 
add, the lower the scheduling priority of that process 
(nicer means less important to the task scheduler). 


For example, the following gets the current process’s ID: 

>>> os.getpicK) 

1072 # Hi , rtn process 1072. 
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Retrieving System Information 

Many programs don’t need to know too much about the platform on which they run, 
but when they do need to know, there’s plenty of Information available to them: 


>>> import os, sys 

>>> os.natne # Name of the os module implementation 

’posiX' 

>>> sys.byteorder # Is the processor big or litti e endian? 

'1 ittle ' 

>>> sys.platform # Platform i denti fi er 

’freebsdS ’ 

>>> os.unameC) # UNIX only 

CFreeBSD', '3.4-RELEASE', 'FreeBSD 3.4-RELEASE #0’,'i386') 


The five-tuple returned by os . uname is(sysname, nodename, release, versi on, 
machi ne). 

j-Cross- ^ See Chapter 38 for coverage of the UNIX statvfs module, usefui for retrieving 
Referenc^ file System Information. 

UNIX System configuration Information is available through os.confstr, 
os.sysconf, os.pathconf, and os.fpathconf: 


os. confstr(name) Returns the string value for the specified 

configuration item; the list of items defined 
for the current platform is in os . confstr_ 
names. 


os. sysconf (name) Similar to os . confstr( name ) except that 

the values os.sysconf(name) returns are 
integers. It also lists the names of the items 
you can retrieve. 

os. pathconf (path ,name) and Return system configuration information 
os. fpathconf (fd, name) relating to a specific path of an open file 

descriptor, os . pathconf_names lists valid 
names. 


For example, to retrieve the system memory page size you can use the following: 

>>> os.sysconf('SC_PAGESIZE') 

8192 


Cross- ' 
Reference 


Chapter 37 covers the wi nreg module that lets you access system information 
stored in the Windows registry. 
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Managing Configuration Files 

The ConfigParser module makes reading and writing configuration files simple. 
Users can slmply edit the configuration files to set varlous run-time options to cus- 
tomize your program’s behavior. The config files are normal text files, organized 
into sections that contain key-value pairs. The files can have comments and can 
contain variables that Conf i gParser evaluates when your program accesses them. 
If you save the file shown in Listing 11-1 to your current working directory as 
sampl e. cf g , you can then follow along with the examples. 


Listing 11-1: sample.cfg - Sample Configuration File 


# This listing is a sample configuration file. 

# Comment lines start with pound symbols or semicolons. 

[Server] 

Address=171.15.2.5 

Port=50002 

[Hoth] 

ID: %(team)s-l 
Team=gold 

Defaul tName=%(_name_)s_User 


Notice that the file can contain blank and comment lines, and that key-value pairs 
can be separated by equal signs or colons. A value can be anything, and you can 
use variable substitution to create values from other values. For example, 

% ( t e a m ) s evaluates to the value of the team variable, and % (_ n a m e_ ) s evaluates 

to the name of the current section. If Confi gParser does not find a variable name 
in the current section, i t also looks in a section named DEFAULT. The variable 
name in parentheses should be lowercase. 

You create a Confi gParser by calling Conf i gParser. Conf i gParser ([defaults]), 

where defaults is an optional dictionary containing values for the DEFAULT section. 
The readfp(f[, filename]) method reads a config file from an open filelike object. 
If the filelike object has a fi 1 ename attribute, Confi gParser uses that for the config 
file’s name (some exceptions it ralses include the file name). You can also pass in an 
optional file name to use. The read ( f i 1 enames ) method reads in the contents of one 
or more config files. It fails silently on nonexistent files, making it safe to pass in a list 
of potential config files that may or may not exist: 

>>> import ConfigParser 

>>> cfg = ConfigParser.ConfigParser() 

>>> cfg.read('sample.cfg') 

[' Server', ' Hoth'] 
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When ConfigParser encounters an error while reading a file or retrieving values, it 
raises one of the exceptions listed in Table 11-3. 



Table 11-3 


ConfigParser Exceptions 

Exception 

Raised when 

NoSectionError 

The specified section does not exist 


DuplicateSectionError 

A section with the specified name aiready exists. 

NoOptionError 

An option with the specified name does not exist. 

InterpolationError 

A problem occurred while performing variable 
evaluation. 

InterpolationDepthError 

The variable evaluation required too many 
recursive substitutions. 

MissingSectionHeaderError 

A key-value pair is not part of any section. 

ParsingError 

Conf i gPa rser encountered a syntactic problem 
not covered by any of the other exceptions. 


Once you have a valid Conf i gParser instance object, you can use its methods to get 
and set values or learn more about the configuration file. The defaul ts () method 
returns a dictionary containing the default key-value pairs for this instance. 
secti ons ( ) returns a list of section names for this config file (not including 
DEFAULT), and has_secti on(section) is aquickway to see if agivensectionexists. 
For any section, the options(section) method returns a list of options in that sec¬ 
tion, and has_option(section, option) tests for the existence of a particular 
option in that section: 

>>> cfg.has_option('Server' , 'port' ) 

1 

>>> cfg.options('Server ' ) 

['address ' , 'port' ] 

Use the get (secti on , optionf, raw[, v a r s ]]) method to retrieve the value of 
an option in a given section. If raw is 1, no variable evaluation takes place. You can 
optionally pass in a dictionary of key-value pairs that get uses in the variable 
evaluation: 


>>> 

cfg. 

get('Hoth ’ , 

'ID' 

,1) 

’ %(team) 

s-1' 



>>> 

cfg. 

get('Hoth ' , 

'ID' 

) # 

'gol 

ld-1' 




>>> 

cfg. 

get('Hoth ' , 

'ID' 

, va 

’ bl ue-l ’ 

# Override 

val 

ues 


# Raw version 

After variable evaluation 

■s={'team':'bl ue' )) 

in the file 
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Confi gParser has a few other get convenience methods. geti nt( secti on , 

0 p t i 0 n ) coerces the value into an integer before returning it, getfloat(section, 
opti on ) does the same for floats, and getbool ean ( secti on , opti on ) makes sure 
the value is a 0 or a 1 and returns it as an integer. 

You can create a new section using the add_secti on (secti on ) method, and you 
can set the value for an option by calling set(section, option, value): 

>>> cfg.get('Serverport ’ ) 

'50002' 

>>> cfg.set('Server','port','4000 ' ) # Use string values! 

>>> cfg.get('Server','port' ) 

'4000' 

The wri te ( f i 1 e ) method writes the configuration file out to the given filelike 
object. The output is guaranteed to be readable by a future call toreadorreadfp. 

The retnove_opti on(section, option) method removes the given option from 
the given section. If the option didn’t exist, retnove_opti on returns 0, otherwise 1. 
retnove_secti on ( secti on ) removes the given section from the config file. As with 
rerriove_opti on, retnove_secti on returns 0 if the section didn’t even exist, 1 
otherwise. 


Understanding Error Names 

When an error occurs in the os module, it usually raises the OS Error exception 
(found in os .error). OSError is aclass, and instances of this class have the errno 
and strerror members that you can access to learn more about the problem: 

>>> try: 

os.close(-l) # A bogus file descriptor 

. . . except OSError, e: 

print 'Blech! %s [Err #%d]' % (e.strerror,e.errno) 

Blech! Bad file descriptor [Err #9] 

The strerror member is the resuit of calling os . strerror( code) with the errno 
member of the exception: 

>>> os . strerror(2) 

'No such file or directory' 

The errno module contains the textual message for each error code. The list of 
defined errors varies by System (for example, the Windows version includes some 
Winsock error messages), but you can access the whole list through the errno . 
errorcode dictionary. 
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For errors involving files or directories, the fi 1 ename member of OSError has a 
non-empty value: 

>>> try: 

os.open('asdfsf',os.0_RD0NLY) 

. . . except OSError, e : 

print e.errno, e.fi 1ename, e.strerror 

2 asdfsf No such file or directory 


Handiing Asynchronous Signais 

The s i gna 1 module lets your programs handle asynchronous process signais. If 
youVe used the underlying C equivalents, you’ll find that the Python version is 
pretty similar. A signal is just a message sent from the operating system or a pro¬ 
cess to the current process; most signais aren’t handled directly by the process but 
are handled by default behavior in the operating system. 

The signal module lets you register handler functions that overrlde the default 
behavior and let your process respond to the signal itself. To register a signal han¬ 
dler, call signal .signal (num,handler) where num is the signal to handle and 
handler is your handler function. A signal handler should take two arguments, the 
signal number and a frame object containing the current stack frame. Instead of a 
function, handler can also be s i g n a 1 . SI G_D F L (meaning that you want the default 
behavior to occur for that signal) or s i gn a 1 .SI G_I GN (meaning that you want that 
signal to be ignored). The si gna 1 function returns the previous value of handler. 

The signais that you can process vary by platform and are defined in your plat- 
form’s signal . h file, but Table 11-4 lists some of the most common signais. 


Table 11-4 

Common Signais 

Name 

Descriptiori 

SIGINT 

Interrupt (CtrI-C hit) 

SIGQUIT 

Quit the program 

SIGTERM 

Request program termination 

SIGFPE 

Floating point error occurred (for example, division by zero, overflow) 

SIGALRM 

Alarm signal (not supported on Windows) 

SIGBUS 

Bus error 

SIGHUP 

Terminal line hangup 

SIGSEGV 

lllegal storage access 
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Thegetsignal (signalnutn) function returns the current handler for the specif ied 
signal. It returns a callable Python object, SIG_DFL, SIG_IGN, or None (for non- 
Python signal handlers). defaul t_i nt_handl er is the default Python signal handler. 

Except for handlers for SIGCHD, all signal handlers ignore the underlylng implementa- 
tion and continue to work until they are reset. Even though the signal handling hap- 
pens asynchronously, Python dispatches the signals between bytecode instructions, 
so a long call into a C extension module could delay the arrival of some signals. 

On UNIX, you can call signal .pauseC) to wait until a signal arrives (at which time 
the correct handler receives it), signal . al artn( time) causes the System to send a 
SIGALRM signal to the current process after ti me seconds; it returns the number of 
seconds left until the previous alarm would have gone off (if any). al arm cancels 
any previous alarm, and a time of 0 removes any current alarm. You can also call 
os . ki 11 (pi d , si g) to send the given signal to the process with the ID of pi d. 

Caution Be carefui when using threads and signals in the same program. In such cases you 
shouid call signal .signal oniy from the main thread (although other threads 
can call alarm, pause, and getsi gnal). Be aware that signals are always sent to 
the main thread, regardiess of the underlying implementation. 

The following example prompts the user for input, but times out if the user doesn’t 
respond in the allotted time (it uses signal . a 1 a rm, so it works on UNIX Systems): 

import signal ,sys 

def handler(sig, frm): 

raise 'timeout' # Raise an exception when time runs out. 

signal.signal(signal.SIGALRM,handler) # Set up the handler. 
try: 

signal.alarm(2.5) # Send ALARM signal in 2.5 seconds. 
w h i 1 e 1: 

print 'Enter code to halt detonation: ’ , 
s = sys.stdin.readline() 
if s.strip() == 'stop ’ : 
print 'You did it!' 
break 

print ' Sorry.' 

signal.alarm(0) # Disable the alarm. 
except: # Handle all exceptions so CtrlC wi11 blow you up too. 

print '\nSorry. Too 1ate.\n*KAB00M*' 

1 saved the file as si g. py. Here’s some sample output: 

/work> python sig.py 

Enter code to halt detonati on: [ Iflait a few seconds. ] 

Sorry. Too late. 

*KAB00M* 
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/work> python 

sig. 

py 


Enter code 

to 

hal t 

detonation: 

f 00 

Sorry. 
Enter code 

to 

hal t 

detonation: 

stop 


Y 0 u did it! 


Summary 

Python’s great support for executing shell commands makes it an ideal solution as 
a scripting language or as a glue that holds various technologies together. Python 
also has ample functionality for starting, controlling, and monitoring child pro- 
cesses. In this chapter you learned to: 

-f Launch other programs in the foreground or the background. 

Access process and system configuration information. 

-f Read and write human-readable configuration files. 

Used file descriptors. 

-f Interpret os error message codes. 

In the next chapter you’ll learn to covert data between various formats, compress 
it, and decompress it. You’ll also learn to convert Python objects to byte streams 
that can be saved for later retrieval or transmitted across a network. 

■f > -f 



Storing Data 
and Objects 



T his chapter covers the many ways that you can convert 
Python objects to some form suitable for storage. 
Storage, however, is not limited to just saving data to disk. By 
the end of this chapter you’ll be able to take a Python object 
and stick it in a database, compress it, send it across a net- 
work connection, or even convert it to a format that a C pro- 
gram could understand. 


Data Storage OverView 

Python’s data storage features are easy to use, but before you 
say, “Hey, store this stuff” (it really is that easy), you should 
put some thought into how you might use the data down the 
road. The issues listed below are merely some things you 
should keep in mind; don’t worry too much yet about how 
actually to deal with them. 

Text versus binary 

If you’re storing data to file, you have to choose whether to 
store it in text or binary mode. A configuration file, for exam- 
ple, is in text mode because humans have to be able to read it 
and edit it with a text editor. lt’s often easier to debug your 
program if the output is stored in some human-readable for¬ 
mat, and you can easily pass such a file around and use it on 
different platforms. Of course, storing it in a human-readable 
format means you handle the details of parsing it back in if 
you need to load it. 

A binary mode representation of data often takes up less 
space, and can be processed faster if it is stored in fixed-size 
blocks or records. 


> ♦ ♦ ♦ 

In This Chapter 

Data storage 
OverView 

Loading and saving 
objects 

Example: moving 
objects across a 
network 

Using database-like 
storage 

Converting to and 
from C structures 

Converting dota to 
Standard formats 

Compressing data 

♦ ♦ ♦ ♦ 
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Compression 

If the size of an object is an issue, compression may be something you want to con¬ 
sidor. In return for some additional Processing power, compression often signifi- 
cantly shrinks the size of your data, which could really help if you have a lot of data 
or are transferring it over slow network connections. 

Byte order ("Endianness") 

The way a processor Stores multibyte numbers in memory is either big-endian or 
little-endian: 

>>> import sys 

>>> print ’"...%s-endian", Gulliver said.’ % sys.byteorder 
"...little-endian", Gulliver said. # On my Intel box 

Most Python programs wouldnT care about such a low-level detail, but if your data 
has the potential to end up on another platform (by copying a data file, for exam- 
ple), the program on the other end has to know the byte order of the data in order 
to understand the data. 

Object state 

Before you store an object, you need to remember that some objects have state 
“outside” the Python interpreter. If you tried to save an open socket connection to 
disk, you certainly couldnT expect the connection to be open once you reload the 
Socket. 

Destination 

You should keep in mind the destination of your data, because knowing that may let 
you take advantage of features particular to that medium. Is it going to a file on 
disk? How about a network connection or a database? 

On the receiving end 

One last thing to consider is what the receiving end of your data will be (who will 
read it in the future?). If you are saving a file that your same program will read later, 
you can use just about whatever storage format you like. If a C program is on the 
other end, maybe you need to send it data in the form of a C structure. Or maybe 
you don’t even know who will read the data, so an industry Standard format such as 
XDR or XML may be the answer. 
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Loading and Saving Objects 

To save an object to disk, you convert it to a string of bytes that the program can 
later read back in to recreate tbe original object. If you’re coming from a Java or C++ 
background, then you recognize this process as marshaling or serialization, but 
Python refers to making preserves out of your objects as pickling. 


Pickling with pickle 

Tbe pickle module converts most Python objects to and from a byte representation: 


>>> import pickle 

>>> stuff = [5,3.5,'AIfred’] 

>>> pstuff = pickle.dumps(stuff) 

>>> pstuff 

"(lp0\012I5\012aF3.5\012aS'Alfred'\012pl\012a." 

>>> pickle.1oads(pstuff) 

[5, 3.5, 'Alfred'] 

The pstuff variable in the above example is a string of bytes, so it’s easy to send it 
to another computer via a network connection or write it out to a file. 

The pickle.dumpsCobjectC, bin]) function returns the serialized form of an 
object, and pickle.dumpfobject, file[, bin]) sends the serialized form to an 
open filelike object. If the optional bi n parameter is 0 (the default), the object is 
pickled in a text form. A value of 1 generates a slightly more compact but less read- 
able binary form. Either form is platform-independent. 

The pi ckl e. 1 oads (str) function unpickles an object, converting the given string 
to its original object form. pi ckl e. 1 oad (file) reads a pickled object from the 
given filelike object and returns the original, unpickled object. 

The 1 oad and dump methods are really shorthand ways of instantiating the Pi ckl e 
and Unpi ckl er classes: 

>>> s = StringlO.StringlO() # Create a temp filelike object. 

>>> p = pickle.Pickler(s , 1) # 1 = binary 
>>> p . dutTip( [1,2,3]) 

>>> p . dutTip( ' Hei 1 0 ! ’ ) 

>>> s.getvalue() # See the pickled form. 

’ ]q\000(K\001K\002K\003e.U\006Hello!q\001. ’ 

>>> s.seek(O) # Reset the "file." 

>>> u = pickle.Unpickler(s) 

>>> u.1oad() 

[1, 2, 3] 

>>> u.1oad() 

’Hei 1 0 ! ' 
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Using the Pi ckl er and Unpi ckl er classes is convenient if you need to pickle many 
objects, or if you need to pass the picklers around to other functions. You can also 
subclass them to create a custom plckler. 

The cPi ckl e module is a C version of the pi ckl e module, making it up to several 
orders of magnitude faster than the pure Python pickle module. Anytime you need 
to do lots of pickling, use cPi ckl e. Objects pickled by cPi ckl e are compatible 
with those pickled by pi ckl e, and vice versa. The only drawback to the cPi ckl e 
module is that you can’t subclass Pi ckl er and Unpi ckl er. 

>>> import cPickle,pickle 

>>> s = cPickle.dumps({'one':1,'two':2}) 

>>> pickle.1oads(s) 

{'one ' : 1, 'two ' : 2) 

As Python evolves, future versions could change the format of pickled objects. To 
prevent disasters, each version of the format has a version number, and pickle has 
a list of other versions (in addition to the current one) that it knows how to read: 

>>> pi ckl e. forniat_versi on 

'1.3' 

>>> pi ckl e. compati bl e_forrriats 

['1.0', 'l.r, '1.2'] # It can read some pretty old objects. 

If you try to unpickle an unsupported version, pickle raises an exception. 


What can I pickle? 

You can pickle numbers, strings. None, and containers (tuples, llsts, and dictionar- 
ies) that contain “picklable” objects. 

When you pickle built-in functions, your own functions, or class definitlons, pickle 
Stores its name along with the module name in which it was defined, but not its 
implementation. In order to unpickle such an object, pickle first imports its mod¬ 
ule, so you must define the function or class at the top level of that module. 

To save an instance object, pi ckl e calls its_ getstate _method, which 

should return whatever Information you need to capture the state of the object. 
When Python loads the object, pickle instantlates a new object and calls its 
_ setstate _method, passing it the unpickled version of its state: 

>>> class Point: 

def _init_(self,x,y): 

self .X = x; self.y = y 

def _str_(self): 

return '(%d,%d)' % (self.x,self.y) 

def getstate (self): 

print 'Get state called!' 
return (self.x,self.y) 

def setstate (self,state): 

print 'Set state called!' 
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self.X,self.y = state 

>>> p = Point(10,20) 

>>> z = pickle.dumps(p) 

Get state called! 

>>> newp = pickle.1oads(z) 

Set state called! 

>>> print newp 
(10,20) 

If an object doesn’t have a_ getstate _member, pi ckl e saves the contents of its 

_ di ct _member. When unpickling an object, the 1 oad function doesn’t normally 

call the objecfs constructor (_ i ni t _). If you really want 1 oad to call the con¬ 
structor, implement a_ geti ni targs _method. As it saves the object, pi ckl e 

calls_ geti ni targs _for a tuple of arguments that it should pass to_ i ni t _ 

when the object is later loaded. 

You can add pickling support for data types in C extension modules using the 
copy_reg module. To add support, you register a reduction function and a con¬ 
structor for the given type by calling copy_reg .pickle(type, reducti on_func[, 
constructor_ob] ). For example, imagine you’re creating a C extension module 
that determines the right stocks to trade on the stock market, and that the module 
defines a new data type called StockType (representing a partlcular security). Your 
constructor object (such as a function) returns a new StockType object and takes 
as arguments whatever data needed to create such an object. Your reduction func¬ 
tion takes a StockType object and returns a two-tuple containing a constructor 
object for creating a new StockType object (most likely the same constructor 
function mentioned above). The reduction function also takes a tuple containing 
arguments to pass to that constructor. After registering your functions for the new 
type, any serialized StockType objects can use them. 

See Chapter 29 for Information on writing your own extension modules. 



Other pickling issues 

Because pickling a class doesn’t store the class implementation, you can usually 
change the class definition without breaking your pickled data (you can stili 
unpickle instance objects that were saved previously). 

Multiple references to a particular object also reference a single object once you 
unpickle it. In the following example, a list has two members that are both refer¬ 
ences to another list. After pickling and unpickling it, the two members stili refer to 
a single object: 

»> z = [1,2,3] 

>>> y = [z,z] 

>>> y[0] is y[l] # Two references to the same object 
1 

>>> s = pickle.dumps(y) 
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>>> X = pickle.1oads(s) 

>>> X 

[[1, 2, 3], [1, 2, 3]] 

>>> x[0] is x[l] # Both members stili reference one object. 
1 


Of course, if you pickle an object, modify it, and pickle it again, pi ckl e saves only 
the first version of the object. 



If, while pickling to a filelike object, an error occurs (for example, you try to serial- 
ize a module), pickle raises the Pi ckl i ngError exception, but it may have 
aiready written bytes to the file. The contents of the file will be in an unknown 
state and not too trustworthy. 


The marshal module 

Under the covers, the pickle module calls the marshal module to do some of its 
work, but most programs should not use marshal at all. The one advantage of mar¬ 
shal is that, unlike pi ckl e, it can handle code objects (the implementation itself): 

>>> def adder(a,b): 

return a+b 
>>> adder(10,2) 

12 

>>> import marshal 

>>> s = marshal.dumps(adder.func_code) 

>>> def newadderf): 
pass 

>>> newadder.func_code = marshal.1oads(s) 

>>> newadder(20,10) 

30 

Cross- A Chapter 33 shows you how to access code objects and other attributes of Python 
Referen^ objects such as functions. 


Example: Moving Objects Across a Network 

The example in this section puts all this pickling stuff to work for you. Listing 12-1 is 
the swap module that creates a background thread that sends objects between two 
Python interpreters running in interactive mode. Although it works on a single com¬ 
puter, you can also run it between two separate computers if you change the IP 
address it uses. 


Chapter 12 4- Storing Data and Objects 201 


Cross- ^ Consider this example as a sneak preview. Chapter 15 covers networking and 
Referen^ Chapter 26 covers threads. 

Here is some sample output from the program in Listing 12-1 (I opened two sepa¬ 
rate MS-DOS Windows on the same computer). Alter the sample output is a short 
explanation of how the program works. The first half shows what is happening in 
the first window, and the second in the other window, although both programs are 
running at the same time and interacting: 

C:\tetnp>python -i -c "itnport swap" 

Listen thread started. 

Use swap.send(obj) to send an object 
Look in swap.obj to see a received object 

>>> swap.send(['gameofthe'year']) # See Objl below. 

Received new object 

(5, 10) # Obj2 from below 

>>> swap.obj 
(5, 10) 

>>> swap.objCl] # Yep, it's a real Python object! 

10 

C:\tetnp>python -i -c "itnport swap" 

Listen thread started. 

Use swap.send(obj) to send an object 
Look in swap.obj to see a received object 
Received new object 

['game', 'of, 'the', 'year'] # Objl from above 

>>> swap. obj [2] Poke around a litti e 

' the' 

>>> swap.send ((5,10)) # See 0bj2 above 

Once both interpreters are up and running, they connect to each other via a net- 
workSocket. Anytimeyou call swap.send(obj) in one interpreter, swap sends your 
object to the other interpreter, which Stores it in swap . ob j . Either side can send 
any picklable object to the other. 

Notice that I started the Python interpreter using the “-c” argument (telling it to exe- 
cute the command i mport swap) and the “-i” argument (telling it to keep the inter¬ 
preter running alter it executes its command). This feature lets you start wlth the 
swap module already loaded and running. 
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Listing 12-1: swap.py - Swap Objects Between Python 
Interpreters 


from Socket import * 
import cPickle,threading 

ADDR = '127.0.0.1' # '127.0.0.1' = localhost 

PORT = 50000 
bConnected = 0 

def send(obj): 

"Sends an object to a remote listener" 

if bConnected: 

conn.send(cPickle.dumps(obj,l)) 
el se: 

pri nt 'Not connected! ' 

def 1 istenThread() : 

"Receives objects from remote side" 

global bServer, conn, obj, bConnected 

w h i 1 e 1: 

# Try to be the server. 

s = socket(AF_INET,S0CK_STREAM) 
try: 

s.bind((ADDR,PORT)) 
s . 1 1sten(1) 
bServer = 1 
conn = s . accepte)[0] 
except Exception, e: 

# Probably already in use, so I'm the Client. 

bServer = 0 

conn = socket(AF_INET,S0CK_STREAM) 
conn.connect((ADDR,P0RT)) 

# Now just accept objects forever. 

bConnected = 1 
w h i 1 e 1: 

0 = conn.recv(8192) 
if not 0 : break; 

obj = ePickle.1oads( 0 ) 
print 'Received new object' 
print obj 
bConnected = 0 

# Start up 1 isten thread. 

threadi ng.Thread(target=listenThread).start() 
print 'Listen thread started.' 
print 'Use swap.send(obj) to send an object' 
print 'Look in swap.obj to see a received object' 
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'Note For the sake of simplicity, the example leaves out a lot of error checking that you'd 

want if you were to use this for something important. 

This module has two functions: send and 1 i stenThread. send takes any object 
you pass in, pickles it, and sends it out through the Socket that is connected to the 
other Python interpreter. 

The 1 i StenThread function loops forever, waiting for objects to come in over the 
Socket. When the function first starts, it tries to bind to the given IP address and 
port so it can act as the server side of the connection. If this attempt fails, it 
assumes that the bind failed because the other interpreter is already acting as the 
server, so 1 i stenThread tries to connect (thus becoming the Client side of the 
connection). Once connected, 1 i stenThread receives each object, unpickles it, 
prints it out and also saves it to the global variable ob j so that you can then fiddle 
with it in your interpreter. 

At the module level, a call tothreading.Thread().start() starts the listening 
thread. By placing the call there, the background thread starts up automatically as 
soon as you import the module. 

After youVe played around with this a little, sit back and relish the fact that all this 
power required a measly 50 lines of Python code! 

Using Database-Like Storage 

The shei ve module enables you to save Python objects into persistent, database- 
like storage, similar to the dbtn module. 

See Chapter 14 for Information on dbm and other Python database modules. 


The shelve.open(file[, mode]) function opens and returns a s hei ve object. 
The mode parameter (which is the same as the mode parameter to dbm. open) 
defaults to ‘c’, which means that the function opens the database for reading and 
writing, and creates it if it doesnT already exist. Use the cl ose () method of the 
s h e 1 V e obj ect when you are finished using it. 

You access the data as if the database were a dictionary: 

>>> import shelve 

>>> db = shelve.open('objdb' ) # Don't use a file extensioni 
>>> db['secretCombination ' ] = [5,23,17] 

>>> db['account' ] = 5671012 
>>> db['secretCombination ' ] 

[5, 23, 17] 

>>> dei db['account' ] 

>>> db.has_key('account' ) 

0 
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>>> db.keys() 

[ ’ secretCombinati on ' ] 

>>> db.close() 

The shei ve module uses pi ckl e, so you can store any objects that pi ckl e can 
store. shel ve has the same limitations as dbm. Among other things, you should not 
use it to store large Python objects. 

Converting to and from C Structures 

Although pi ckl e makes converting Python objects to a byte stream easy, really 
only Python programs can convert them back to objects. The struet module, how- 
ever, lets you create a string of bytes equivalent to a C structure, so you could read 
and write binary files generated by a non-Python program or send binary network 
messages to something besides a Pytbon interpreter. 

To use struet, you call struet. pack( format, vl, v2, . . .) with a format string 
describing the layout of the data followed by the data itself. Construet the format 
string using format characters listed in Table 12-1. 


Table 12-1 

struet Format Characters 

Character 

c type 

Python type 

C 

Char 

string of length 1 

s 

char[] 

string 

p 

(Pascal string) 

string 

1 

Int 

integer 

I 

Unsigned i nt 

integer or long* 

b 

Signed char 

integer 

B 

unsigned char 

integer 

h 

Short 

integer 

H 

unsigned short 

integer 

1 

Long 

integer 

L 

unsigned 1 ong 

long 

f 

FI oat 

float 

d 

Double 

float 

X 

(pad byte) 

- 

p 

voi d * 

integer or long* 


* The type Python uses is based on whether a pointer for this platform is 32 or 64 bits. 
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For example, to create the equivalent of this C struet: 

struet 

I 

i n t a ; 
i n t b; 
char c; 


with the values 10, 20, and ‘Z,’ use: 

>>> import struet 

>>> z = struet.paek('i i e',10,20,'Z' ) 

>>> z 

’\012\000\000\000\024\000\000\000Z’ 

Given a string of bytes in a particular format, you can convert them to Python 
objects by calling struet.unpaekfformat, data). It returns a tuple of the recon- 
structed data: 

>>> struet.unpaek(’i i e’,z) 

(10, 20, 'Z' ) 

The format string you pass to unpaek must account for all the data in the string you 
pass it, or struet raises an exception. Use the struet. eal esi ze (format) func- 
tion to figure out how many bytes would be taken up by the given format string: 

>>> struet.ealesize('i i e' ) 

9 

>>> len(z) # The earlier example veri fi es this. 

9 

As a shorteut, you can put a number in front of any format character to repeat that 
data type that many times: 

>>> struet.pack('3f',1.2,3.4,5.6) # '3f' is the same as 'fff' 
'\232\231\231?\232\231Y@33\263@' 

For clarity, you can put whitespace between format characters in your format string 
(but not between the format character and a repeater number): 

>>> struet.pack('2i h 3e',5,6,7,'a','b','e' ) 
'\005\000\000\000\006\000\000\000\007\000abe' 

The repeater number works a little differently with the ‘s’ (string) format character. 
The repeater telis the length of the string (5s means a 5 character string). Os means 
an empty string, but Oe means zero characters. 

The ‘I’ format character unpacks the given number to a Python long integer if the C 
i nt and long are the same size. If the C i nt is smaller than the C long, T’ converts 
the number to a Python integer. 
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The ‘p’ format character is for a Pascal string. Pascal uses the first byte to store the 
length of the string (so Pascal first truncates strings longer than the maximum 
length of 255) and then the characters in the string follow. If you supply a repeater 
number with this format character, it represents the total number of bytes in the 
string including the length byte. If the string is less than the specified number of 
bytes, pack adds empty padding characters to bring it up to snuff. 

By default, struet Stores numbers using the native format for byte order and struc¬ 
ture member alignment (whatever your current platform’s C compiler would use). 
You can override this behavior by starting your format string with one of the modi- 
fiers listed in Table 12-2. For example, you can force struet to use network order, a 
Standard byte ordering for network messages: 

>>> struet.paek(' i e 65535D' ) # Native is 1 ittle-endian. 

'\377\377\000\000D' 

>>> struet.paek('! i e 65535D' ) # Force network order. 

’\000\000\377\377D' 


Table 12-2 

Order, Alignment, and Size Modifiers 


Modifier 

Byte order 

Alignment 

Size 

< 

Littie-endian 

None 

Standard 

> or ! 

Big-endian (Network) 

None 

Standard 

= 

Native 

None 

Standard 

0 

Native 

Native 

Native 


If you don’t choose a modifier from Table 12-2, struet uses native byte ordering, 
alignment, and size. When you use a modifier whose size is “Standard,” a C short 
takes up 2 bytes, an i nt, 1 ong, or f 1 oat uses 4, and a doubl e uses 8. 

If you need to have alignment but aren’t using the ‘@’ (native alignment) modifier, 
you can insert pad bytes using the ‘x’ format character from Table 12-1. If you need 
to force the end of a structure to be aligned according to the alignment rules for a 
particular type, you can end your format string with the format code for that type 
with a count of 0. The following example shows how to force a single-character 
structure to end on an integer boundary: 

>>> struet.pack('c','A' ) 

'A' 

>>> struet.pack('cOiA') 

’A\000\000\000’ 


The ‘P’ (pointer) format character is available with native alignment only. 
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The struet module is very useful for reading and writing binary files. For example, 
if you read the first 36 bytes of a Windows WAV file, you can use struet to extract 
some Information about the file. The header of a WAV file starts with: 

'RIFF' (4 bytes) 

1 ittle-endian length field (4 bytes) 

'WAVE' (4 bytes) 

’fmt ' (4 bytes) 

format subehunk length (4 bytes) 
format speeifier (2 bytes) 
number of ehannels (2 bytes) 
sample rate in Flertz (4 bytes) 
bytes per seeond (4 bytes) 
bytes per sample (2 bytes) 
bits per ehannel (2 bytes) 

One way to represent this header would be with the format string 

' <4s i 4s 4s ihhiihh' 

The following code extracts this Information from a WAV file: 

>>> s = open( ' c : Wwi nntWmedi aWri ngi n .waV' , ' rb ' ). read(36) 

>>> struct.unpack('<4si4s4sihhiihh' ,s) 

CRIFF', 10018, 'WAVE', 'fmt ', 16, 1, 1, 11025, 11025, 1, 8) 

Extending that example, the following function rates the sound quality of a given 
WAV file: 

>>> def rateWAV(fi 1ename): 

format = '<4si4s4sihhiihh ' 

fsize = struet.calesize(format) 

data = open(fi 1ename,'rb').read(fsize) 

data = struet.unpaek(format,data) 

if dataCO] != 'RIFF' or data[2] != 'WAVE': 

print 'Not a WAV file!' 
rate = data[7] 
if rate == 11025: 

print 'Telephone quality!' 
elif rate == 22050: 

print 'Radio quality!' 
elif rate == 44100: 

print 'Oooh, CD quality!' 
el se: 

print 'Rate is %d Flz' % rate 

>>> rateWAV(r'e:\winnt\media\notify.wav' ) 

Radio quality! 

>>> rateWAVC'onli ne.wav ' ) 

Oooh, CD quality! 
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Converting Data to Standard Formats 

Now that you have the struet module under your belt, you can build on that 
knowledge to read and write just about any file format. If your data needs to be 
readable by your own programs only, then you can create your own convention for 
storing data. In other cases, however, you may find it useful to convert your data to 
an industry-wide Standard. 


Sun's XDR format 

The XDR (eXternal Data Representation) format is a Standard data format created 
by Sun Microsystems. RFC 1832 defines the format, and it’s most common use is in 
NFS (NetWork File System). Storing data in a Standard format like XDR makes shar- 
ing files easier for different hardware platforms and operating Systems. 

The xdrl i b module implements a subset of the XDR format, leaving out some of 
the less-used data types. To convert data to XDR, you create an instance of the 
xdrl i b. Packer class, and to convert from XDR, you create an instance of 
xdrlib.Unpacker. 


Packer objects 

The Packer constructor takes no arguments: 

>>> import xdrlib 
>>> p = xdrlib.Packer() 


Once you have a Packer object you can use any of its pack_<type> methods to 
pack basic data types: 


>>> p.pack_fl oat(3.5) 
>>> p.pack_double(10.5) 
>>> p.pack_int(-15) 

>>> p.pack_uint(15) 

>>> p.pack_hyper(100) 
>>> p.pack_uhyper(200) 
>>> p . pack_enurri(3) 

>>> p.pack_bool(1) 

>>> p.pack_bool("Hi") 


# 32-bit floating point number 

# 64-bit floating point number 

# Signed 32-bit integer 

# Unsigned 32-bit integer 

# Signed 64-bit integer 

# Unsigned 64-bit integer 

# Enumerated type 

# Booleans are 1 or 0 

# Value is true. so Stores a 1 


Thepack_fstring(count, str) method packs a fixed-length string c o u n t charac- 
ters long. The function does not store the size of the string, so to unpack it you 
have to know how long it is beforehand. Better yet, use p a c k_s t r i n g (s t r ), which 
lets you pack a variable-length string: 


>>> p.pack_string('Lovely ' ) 
>>> p.pack_fstring(3,'day ' ) 
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The p a c l<_s t r i n g function calls p a c l<_u i n t with the size of the string and then 
pack_fstri ng with the string itself. To more fully follow the XDR specification, a 
Packer object also has pack_bytes and pack_opaque methods, but they are really 
just calls to pack_stri ng. Likewise, a call to pack_f opaque is really just a call to 
pack_fstring. 

The pack_farray (count, list, packFunc) function packs a fixed-length array 
(count items long) of homogenous data. Unfortunately, pack_farray requires that 
you pass in the count as well as the list itself, but it won’t let you use a count that is 
different from the length of the list (go figure). As with pack_f stri ng, the function 
does not store the length of the array with the data, so you have to know the length 
when you unpack it. Or you can call pack_array (list, packFunc) to pack the 
size and then the list itself. The packFunc telis Packer which method to use to 
pack each item. For example, if each item in the list is an integer: 

>>> p.pack_array([l,2,3,4],p.pack_i nt) 

The pack_list(list,packFunc) method also packs an array of homogenous data, but 
it Works with sequence objects whose size might not be known ahead of time. For 
example, you could create a class that defines its own_ geti tem _method: 

>>> class MySeq: 

def _getitem_(self,i): 

i f i < 5: 
return i 

raise IndexError 
>>> m = MySeq() 

>>> for i in m: 
p r i n t i 

0 

1 

2 

3 

4 

>>> p.pack_li st(m,p.pack_int) 

The get_buf fer () method returns a string representing the packed form of all the 
datayouVe packed. reset() empties the buffer: 

>>> p.resetC) 

>>> p.pack_int(10) 

>>> p.get_buffer() 

'\ 000 \ 000 \ 000 \ 012 ' 

>>> p.reset() 

>>> p.get_buffer() 
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Unpacker objects 

Not surprisingly, an Unpacker object has methods that closely mirror those of a 
Packer object. When you construet an Unpacker, you pass in a string of bytes for it 
to decode, and then begin calling its unpack_<type> methods (each pack_ method 
has a corresponding unpack_ method): 

>>> import xdrlib 
>>> p = xdrlib.Packer() 

>>> p.pack_float(2.0) 

>>> p.pack_fstring(4,'Dave' ) 

>>> p.pack_string('/export/home ' ) 

>>> u = xdrlib.Unpacker(p.get_buffer()) 

>>> u.unpack_float() 

2.0 

>>> u.unpack_fstring(4) 

'Da ve' 

>>> u.unpack_string() 

' /export/home' 

>>> u .done() 

Thed one() method telis the Unpacker that you are finished decoding data. If 
Unpacker stili has data left in its internal buffer, it raises an Error exception to 
inform you that the internal buffer has leftover data. 

Calling the reset(str) method replaces the current buffer with the data in str. At 
anytime, you can call the get_buffer() method to retrieve the string representa- 
tion of the data stream. 

You can use the get_posi ti on () and set_posi ti on (pos) methods to track and 
reposition where in the buffer the Unpacker decodes from next. To be safe, set a 
position to 0 or to a value returned from get_posi ti on. 


Other formats 

Of course, you might use many other data formats. XML is gaining popularity as a 
data storage markup language; see Chapter 18 for more information. 

For any given file format, a quick search on a Web search engine locates many 
documents describing the details of that format (for example, try searching for 
“WAV spec”). Once you have that information, creating format strings that struet 
can understand is usually a straightforward process. 


Compressing Data 

This final section covers the use of the zl i b, a module wrapplng the free zl i b com- 
pression library. The gzi p and zi pf i 1 e modules use zl i b to manipulate GZIP and 
ZIP files, respectively. 
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Tip 


zlib 

You can use the zlib module to compress any sort of data; if you are transferring 
large messages over a network, it may be worthwhile to compress them first, for 
example. 

The most straightforward use of zl i b is through the compress(string[, level]) 
and decotTipress(string[, wbits[, bufsize]]) functions. The level used dur- 
ing compression is from 1 (fastest) to 9 (best compression), defaulting to 6. During 
decompression, the wbi ts argument Controls the size of the history buffer, and 
should have a value between 8 and 15 (the default). A higher value consumes more 
memory but increases the chances of better compression. The bufsi ze argument 
determines the initial size of the buffer used to hold decompressed data. The 
library modifies this size as needed, so you never really have to change it from its 
default of 16384. Both compress and decompress take a string of bytes and return 
the compressed or decompressed equivalent: 

>>> import zlib 

>>> longString = 100 * 'That zlib module sure is fun!’ 

>>> compressed = zlib.compress(1ongString) 

>>> 1 en(1ongString); 1 en(compressed) 

2900 

62 @code:# Yay, zlib! 

>>> zlib.decompress(compressed) [: 40] 

'That zlib module sure is funIThat zlib m' 

To leam more about zlib's features, visit the zlib Web site at http:// 

^ WWW .info-zip.org/pub/infozip/zlib/. 

The zlib module has two functions for computing the checksum of a string (useful 
in detecting changes and errors in data or as a way to warm your CPU), 
crc32(string[, value]) and adler32(string[, value]).lf present, the 
optional value argument is the starting value of the checksum, so you can calcu¬ 
late the checksum of several pieces of input. The following example shows you how 
to use a checksum to detect data corruption: 

>>> data = 'My dog has no fleas! ' 

>>> zlib.adier32(data ) 

1193871046 

>>> data = data[:5] + 'z'+data[6: ] 

>>> data 

'My doz has no fleas!' # A solar flare corrupts your data... 

>>> zlib.adier32(data ) 

1212548825 # ... resuiting in a different checksum. 

The value returned from crc32 is more reliable than that returned from adi er32, 
but it also requires much more computation. (More reliable means that the function 
is less likely to return the same checksum if the data changes at all.) Don’t forget to 
dazzle your friends by informing them that Mark Adler wrote the decompression 
portion of zl i b. 
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If you have more data than you can comfortably fit in memory, zl i b lets you 
create compression and decompression objects. Create a compression object by 
callingcotnpressobj([level]). Once you have your obj ect, you can repeatedly call 
its cotnpress ( stri ng ) metbod. Each call returns another portion of the com- 
pressed version of the data, although some is saved for later processing. Calllng the 
compression objecfs f 1 ush ([mode] ) method finishes the compression and 
returns the remaining compressed data: 


>>> 

c = 

zl i b 

.compressobj(9) 

>>> 

out 

= c. 

compress( 1000 * 

>>> 

out 

+= c 

.compress( 200 * 

>>> 

out 

+= c 

.flush() 

>>> 

1 en 1 

; out) 

# out holds th 

115 





'I wi11 not throw knives') 
' 0 r c h a i r s ' ) 

enti re compressed stream. 


If you call f 1 ush with a mode of Z_FULL_FLUSH or Z_SYNCFI_FLUSFI, all the currently 
buffered compressed data is returned, but you can later compress more data with 
the same object. Without those mode values, the compression object assumes 
you’re finished and doesn’t allow any additional compression. 

You create a decompression object by calling zl i b’s decompressob j ([wbi ts] ) 
function. A decompression object lets you decompress a stream of data one piece 
at a time (for example, you could decompress a file by repeatedly reading a chunk 
of data, decompressing that chunk, and writlng the resuit to an output file). 

Call the decompress (stri ng) method of your decompression object to decom¬ 
press the next chunk of data, decompress returns the largest amount of decom- 
pressed data that it can, although it may need to buffer some until you supply more 
data to decompress. The following code decompresses the output from the previ- 
ous example 20 bytes at a time: 

>>> d = zlib.decompressobj() # Create a decompressor. 

>>> msg = ' ' 

>>> while out: 

msg += d.decompress(out[:20]) # Decompress some. 
out = out[20:] 

>>> msg += d.flushf) # Let it know that we're all done. 

>>> len(msg) 

24800 

>>> 1000 * len('I will not throw knives') +\ 

... 200 * 1 en('or chairs ' ) 

24800 # Length matches that of the original message. 

>>> msg[:50] # Looks 1 i ke the message itself matches too. 

'I will not throw knivesi will not throw knivesi wi' 

Call the decompression objecfs f 1 us h ( ) method when you’re done glving it more 
data (after this you can’t call decompress any more with that object). 
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Decompression objects also have an unused_data member tbat holds any leftover 
compressed data from tbe last call to decotnpress. A nonempty unused_data 
string means that the decompression object is stili waitlng on additional data to 
finish decompressing this particular piece of data. 

gzip 

The gzi p module lets you read and write . gz (GNU gzip) files as if tbey were ordi- 
nary files (that is, your program can pretty much ignore the fact that compression/ 
decompression is taking place). 

/Note The GNU gzi p and gunzi p programs support additional formats (for example, 

' compress and pack), butthe gzip Python module does not. 

The gzip.GzipFile([filenatne[, tnode[, cotnpresslevelf, fileobj]]]]) 

function constructs a new Gzi p Fi 1 e object. You must supply either the fi 1 enatne 
or the fi 1 eobj argument, although the file object can be anything that looks like a 
file such as a cStringlO object. The compressi evel parameter has the same 
values as for zl i b module earlier in this section. 

If you don’t supply a m o d e , then gzip trles to use the mode of f i 1 e o b j . If thafs not 
possible, the mode defaults to ' rb ' (open for reading). A Gz i p Fi 1 e can’t be open 
for both reading and writlng, so you should use a mode of'rb', 'wb',or 'ab'. 

When you call the c 1 o s e () method of a G z i p F i 1 e , the file object (if you supplied 
one) remains open. 

To further the illusion of normal file I/O, you can call the open (fi 1 ename[, mode[, 
1 evel ] ]) function in the gzip module. The f i 1 ename argument is required, so the 
call looks very similar to Python’s built-in open function: 


>>> 

f = gzip.op 

en (' 

smal1. 

gz 

' , ' wb 

' ) 


>>> 

f.write(' ' ' 

Old 

woman! 






Man! 








Old Man, so 

rry. 

What 

k 

n i g h t 

1 i ves 

in that castle over 

there? 








I’m thirty- 

seve 

n. 






What? 








I'm thirty- 

seve 

n - - I 

' m 

not 

old! 



Wel1, I can 

't j 

ust ca 

11 

you 

'Man'. 



Wel1, you c 

oul d 

say ' 

De 

n n i s ' 



>>> 

f.close() 







>>> 

f = gzip.op 

en (' 

smal1. 

gz 

' ) 



>>> 

print f.rea 

d() 






Old 

woman! 







Man! 








Old 

Man, sorry. 

Wh 

at kni 

g h t 1 i V 

es in 

that castle over there? 

I' m 

thirty-seve 

n. 
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What? 

I'tn thirty-seven -- I'tn 
Wel 1 , I can't just cal 1 
Wel 1 , you could say 'De 


not old! 
you 'Man ' . 
n n i s ' . 


zipfile 

The zipfile module lets you read, write, and get Information about files stored in 
the common ZIP file format. 

Note The zi pfi 1 e module does not currently support ZIP files with appended com- 
^ ments or files that span multiple disks. 

The i pf i 1 e. i s_zi pf i 1 e (fi 1 enatne ) function returns true if the given file name 
appears to be a valid zip file. 

The zi pfi 1 e module defines the Zi pFi 1 e, Zi pinfo, and PyZi pFi 1 e classes. 


The ZipFile class 

This class is the primary one used to read and write a ZIP file. You create a ZipFile 
instance object by calling the ZipFile(filenanie[, tnode[, compressiori]]) 
constructor: 

>>> import zipfile 

>>> z = zipfi 1 e.ZipFi 1 e('room.zip’) 

>>> z.printdirf) # Print formatted summary of the archive 

File Name Modi fied Size 

World 2000-09-05 09:25:14 10919 

cryst.cfg 1999-03-07 06:14:34 27 

The mode is ‘r’ (read, the default), ‘w’ (write), or ‘a’ (append). If you append to a ZIP 
file, Python adds new files to it. If you append to a non-ZlP file, however, Python 
adds a ZIP archive to the end of the file. Not all ZIP readers can understand this 
format. The compressi on argument is either ZIP_ST0RED (no compressed) or 
ZIP_DEFLATED (use compression). 

The namel i st( ) method of your Zi pFi 1 e object returns the list of files the ZIP 
contains. You can get a Z i p I n f o object (described in the next section) for any file 
via the getinfo(name) method, or you can get a list of Zi pinfos for the entire 
archive with the i nfol i st () method: 

>>> z .namelist() 

['world', 'cryst.cfg'] # The ZIP contains two files. 

>>> z.getinfo('worl d') # Get some info for file named 'world.' 

<zipfi 1 e.Ziplnf 0 instance at 010FD14C> 

>>> z.getinfo('world').fi 1e_size 
10919 
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Tip 


>>> z . infolist() 

[<zipfi1 e.Zipinfo instance at 010FD14C>, 

<zi pfi 1 e . Zi pinfo instance at OlOEllGO] 

If you open the ZIP in read or append mode, read ( name ) decompresses the speci- 
fied file and returns its contents: 

>>> print z.read('cryst.cfg' ) 

[World] 

MIXLIGHTS=true_rgb 

The testzi p () method returns the name of the first corrupt file or None if ali files 
are okay: 

>>> z.testzipf) 

'world' # The file called 'world' is corrupt. 

For ZlPs opened in write or append mode, the write(zipInfo, bytes) method 
adds a new file to the archive. bytes contains the content of the file, and zi pinfo 
is a Zi pl nf 0 object (see the next section) with the file’s information. You don’t 
have to fili in every attribute of Z i p I n f o , but at least fili in the file name and 
compression type. 

The write(filename[, arcnamef, cotnpress_type] ]) function adds the con¬ 
tents of the file fi 1 enatne to the archive. If you supply a value for arcnatne, that is 
the name of the file stored in the archive. If you supply a value for cotnpress_type, 
it overrides whatever compression type you used when you created the Z i p Fi 1 e. 

After making any changes to a ZIP file, calling the cl ose() method is essential to 
guaranteeing the integrity of the archive. 

AZipFile object has a debug attribute that you can use to change the level of 
^ debug output messages. Most output comes with a value of 3, the least (no out- 
^ put) is with a value of 0, the default. 

The Zipinfo class 

Information about each member of a ZIP archive is represented by a Z i p I n f o 
object. You can use the ZipInfo([filenatne[, date_ti me] ]) constructor to cre¬ 
ate one; geti nfo () and i nfol i st () also return Zipinfo objects. The f i 1 ename 
should be the full path of the file and date_ti me is a six-tuple containing the last 
modification timestamp (see the date_time attribute in Table 12-3). 

Each Zipinfo instance object has many attributes; the most useful are listed in 
Table 12-3. 
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Table 12-3 

Zipinfo Instance Attributes 

Name 

Description 

fi 1ename 

Name of the archived file 

compress_s1ze 

Size of the compressed file 

fi 1e_size 

Size of the original file 

date_time 

Last modification date and time, a six-tuple consisting of year, 
month (1-12), day (1-31), hour (0-23), minute (0-59), 
second (0-59) 

compress_type 

Type of compression (stored or deflated) 

CRC 

The CRC32 of the original file 

comment 

Comment for this entry 

extract_version 

Minimum Software version needed to extract the archive 

header_offset 

Byte offset to the file's header 

fi 1e_offset 

Byte offset to the file's data 


The PyZipFile class 

The PyZi p Fi 1 e class is a utility class for creating ZIP files that contaln Python mod¬ 
ules and packages. Py Z i p F i 1 e is a subclass of Z i p F i 1 e, so its constructor and 
methods are the same as for Z i p Fi 1 e. 

The only method that PyZipFile adds is wri tepy (pathname ), which searches for 
*.py files and adds their corresponding bytecode files to the ZIP file. For each 
Python module (for example, file.py), wri tepy archives file.pyo if it exists. If not, it 
adds file.pyc if it exists. If that doesnT exist either, wri tepy compiles the module to 
create file.pyc and adds it to the archive. 

If pathname is the name of a package directory (a directory containing the_init_.py 

file), wri tepy searches that directory and all package subdirectories for all *.py files. 
If pathname is the name of an ordinary directory, it searches for *.py files in that 
directory only. Finally, if pathname is just a normal Python module (for example, 
file.py), wri tepy adds its bytecode to the ZIP file. 

Cross- A Refer to Chapter 6 for more Information on Python packages. 

Reference \ 
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Summary 

Python makes a breeze of serializing or marshaling objects to disk or over a net- 
work, and its support for compression and data conversion only makes life easier. 
In this chapter you: 

4 Serialized objects. 

4 - Transported objects across a network connection. 

4 - Converted objects to formats readable by C programs. 

4 Stored objects in the Standard XDR format. 

4 - Compressed data to save space. 

In the next chapter you’ll learn to track how long parts of your program take to run, 
retrieve the date and time, and print the date and time in custom formats. 

4 - ♦ 4 - 



Accessing Date 
and Time 


D ates can be written in many ways. Converting between 
date formats is a common chore for computers. Date 
arithmetic — like finding the number of days between June 10 
and December 13 — is another common task. Python’s time 
and calendar modules Help track dates and times. They even 
handle icky details like dayllght savings time and leap years. 


Telling Time in Python 

Time is usually represented as either a number or a tuple. The 
ti me module provides functions for working with times, and 
for converting between representations. 

Ticks 

You can represent a point in time as a number of “ticks” — the 
number of seconds that have elapsed since the epoch. The 
epoch is an arbitrarily chosen “beginning of time.” For UNIX 
and Windows Systems, the epoch is 12:00am, 1/1/1970. For 
example, on my computer, my next birthday is 983347200 in 
ticks (which translates into February 28, 2001). 

The function time. ti me returns the current System time in 
ticks. For example, here is the number of days from now until 
my birthday: 

>>> 983347200 - time.timeO 
7186162.7339999676 



> ♦ ♦ ♦ 
In This Chapter 

Telling time in Python 

Converting between 
time formats 

Porsing and printing 
dates and times 

Accessing the 
calendar 

Using time zones 

Allowing two-digit 
years 

♦ ♦ ♦ ♦ 


Note that Python uses a floating-point value for ticks. Because 
time precision varies by operatlng system, ti me . ti me is 
always an integer on some Systems. 
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Date arithmetic is easy to do with ticks. However, dates before the epoch cannot be 
represented in this form. Dates in the far future also cannot be represented this 
way—the cutoff point is sometime in 2038 for UNIX and Windows. 

/Note Third-party modules such as mxDateTime provide date/time classes that function 
^ —' outside the range 1970-2038. 

TimeTuple 

Many of Python’s time functions handle time as a tuple of 9 numbers, as shown in 
Table 13-1: 


Table 13-1 

Time Functions 

Index 

Field 

Values 

0 

4-digit year 

1993 

1 

Month 

1-12 

2 

Day 

1-31 

3 

Hour 

0-23 (0 is 12 a.m.) 

4 

Minute 

0-59 

5 

Second 

0-61 (60 or 61 are leap-seconds) 

6 

Day of week 

0-6 (0 is Monday) 

7 

Day of year 

1-366 (Julian day) 

8 

Dayiight savings 

-1,0,1 


Note that the elements of the tuple proceed from broadest (year) to most granular 
(second). This means that one can do linear comparisons on TimeTuples: 

>>> TimeA = (1972, 5, 15, 12, 55, 32, 0, 136, 1) 

>>> TimeB = (1972, 5, 16, 7, 9, 10, 1, 137, 1) 

>>> TitTieA<TitTieB # TimeA is a day before TimeB. 

1 

Note that a TimeTuple does not include a time zone. To pinpoint an actual time, one 
needs a time zone as well as a TimeTuple. 


Stopwatch time 

The clock function acts as a stopwatch for timing Python code — you call cl ock 
before doing something, call it again afterwards, and take the difference between 
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numbers to get the elapsed seconds. The actual values returned by cl ock are 
system-dependent and generally don’t translate into a time-of-day. This code 
checks how quickly Python counts to one million: 


>>> def CountToOneMi11 ion(): 

StartTi tTie=titTie . cl ock() 

for X in xrange(0,1000000): pass 

EndTi tTie=titTie . cl ock() 

print EndTime-StartTime 



>>> CountToOneMi 11 i on() # Elapsed time, in seconds 
0.855862726726 

The proper way to pause execution is with time. sl eep( n ), where n is a floating 
point number of seconds. In a Tkinter application, once can call the after 
method on the root object to make a function execute after n seconds. (See 
Chapter 19 for more on Tkinter.) 


Converting Between Time Formats 

The function localtime converts from ticks to a TimeTuple for the local time zone. 
For example, this code gets the current time: 

>>> time.1ocaltime(time.time()) 

(2000, 12, 6, 20, 0, 9, 2, 341, 0) 

Reading the fields of the TimeTuple, I can see that it is the year 2000, December 6, 
at 20:00 (8 p.m.) and 9 seconds. The day of the week is 2 (Wednesday), it is the 
341 st day of the year, and local clocks are not currently on Daylight Savings Time. 

The function gmtime also converts from EpochSeconds to a TimeTuple. It returns the 
current TimeTuple for UTC (Universal Coordinated Time, formerly Greenwich Mean 
Time). This call to gmtime shows that it is 4 a.m. in England (a bad time to telephone): 

>>> time.gmtime(time.time()) 

(2000, 12, 7, 4, 4, 9, 3, 342, 0) 

The function mktime converts from a TimeTuple to EpochSeconds. It interprets the 
TimeTuple according to the local time zone. The function mkti me is the inverse of 
localtime, and it is useful for doing date arithmetic. (The inverse function of 
gmtime is cal endar . timegm.) This code finds the number of seconds between two 
points in time: 

>>> TimeA = (1972, 5, 15, 12, 55, 32, 0, 136, 1) 

>>> TimeB = (1972, 5, 16, 7, 9, 10, 1, 137, 1) 

>>> time.mktime(TimeB)-time.mktime(TimeA) 

65618.0 

>>> _ / (60*60) # How many hours is that? 

18.227222222222224 





222 Part II > Files, Data Storage, and Operating System Services 


Parsing and Printing Dates and Times 

The asctime function takes a TimeTuple, and returns a human-readable timestamp. 
It is especially useful in log files: 

>>> Now=titne. 1 ocal titne( ti me . titneC)) # Now is a TimeTuple. 

>>> time.asctimeCNow) 

'Sun Dec 10 10:09:41 2000' 

>>> # In version 2.1, you can call asctimeC) and localtimeC) 

»> # with no arguments to use the current time: 

>>> time.asctime() 

'Sun Dec 10 10:09:41 2000' 

The function ctime returns a timestamp for a time expressed in ticks: 

>>> time.ctime(time.time()) 

'Sun Dec 10 10:11:29 2000' 


Fancy formatting 

The function strftimeC format, timetupl e ) formats a TimeTuple in a format you 
specify. The function strfti me returns the string /brmaf after performing substitu- 
tions on various codes marked with a percent sign, as shown in Table 13-2: 


Table 13-2 

Time Formatting Syntax 

Code 

Substitution 

Example / Range 

%a 

Abbreviated day name 

Thur 

%A 

Full day name 

Thursday 

%b 

Abbreviated month name 

Jan 

%B 

Full month name 

January 

%c 

Date and time representation 
(equivalent to %x %X) 

12/10/00 10:09:41 

%d 

Day of the month 

01-31 

%H 

Flour (24-hour clock) 

00-23 

%h 

Flour (12-hour clock) 

01-12 

%j 

Julian day (day of the year) 

001-366 

%m 

Month 

01-12 

o/oM 

Minute 

00-59 

%p 

A.M. or P.M. 

AM 
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Code 

Substitution 

Example / Range 

%S 

Second 

00-61 

%U 

Week number. Week starts with 

Sunday; days before the first 

Sunday of the year are in week 0. 

00-53 

%w 

Weekday as a number (0=Sunday) 

0-6 

%W 

Week number. Week starts with 
Monday; days before the first Monday 
of the year are in week 0. 

00-53 

%x 

Date 

12/10/00 

%X 

Time 

10:09:41 

%y 

2-digit year 

00-99 

%Y 

4-digit year 

2000 

%Z 

Time-zone name 

Pacific Standard Time 

%% 

Literal % sign 



For example, I can print the current week number: 

>>> time.strftime("It’s week %W!",Now) 

"It's week 49!" 

Here is the default formatting string (with the same results as calling cti me): 

>>> time.strftime("%a %b %d %I:%M:%S %Y",Now) 

'Sun Dec 10 10:09:41 2000' 


Parsing time 

The function strpt i me(t i me stringi,format]) is the reverse of strfti me; it 

parses a string and returns a TimeTuple. It guesses at any unspecified time compo- 
nents. It raises a ValueError if it cannot parse the string timestring using the format 
format. The default format is the one that cti me uses: “%& %h %d %I:%M:%S %Y”. 

Note The strpti me function is available on most UNIX systems; however, it is unavail- 
able on Windows. 

Localization 

Different countries write dates differently—for example, the string “2/5” means 
“February 5” in the United States, but “May 2” in England. The function strfti me 
refers to the current locale when performing each substitution. For example, the 





224 Part II > Files, Data Storage, and Operating System Services 


format string “%x” uses the correct day-month ordering for the current locale. 
However, you stili need to take locale into account when writing code — for 
instance, the format string “%m/%d” is not correct for all locales. 



See Chapter 34 for an overview of the 1 ocal e module and other Information on 
internationalization. 


Accessing the Calendar 

The calendar module provides high-level functions and constants that comple- 
ment the lower-level functions in the time module. Because cal endar uses ticks 
internally to represent dates, it cannot provide calendars outside the epoch 
(usually 1970-2038). 

Printing monthly and yearly calendars 

The following sections show examples of printing monthly and yearly calendars. 


monthcalendar(yearnum,monthnum) 

The function monthcalendar returns a list of lists, representing a monthly calen¬ 
dar. Each entry in the main list represents a week. The suhlists contain the seven 
dates in that week. A 0 (zero) in the sublist represents a day from the previous or 
next month: 

>>> calendar.monthcalendar(2000,5) # 4 1/2 weeks in May, 2000 
[[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13, 14], [15, 16, 

17, 18, 19, 20, 21], [22, 23, 24, 25, 26, 27, 28], [29, 30, 31, 

0 , 0 , 0 , 0 ]] 

month(yearnum,monthnum[,width[,linesperweek]]) 

The month function returns a multiline string that looks like a monthly calendar for 
month monthnum of year yearnum. Months are numbered normally (from 1 for 
January up to 12 for December). The parameter width specifles how wlde each col- 
umn is; the minimum (and default) value is 2. The parameter linesperweek specifies 
how many rows to print for each week. It defaults to 1; setting it to a higher number 
like 5 leaves space to write on a printed calendar. Here are two examples: 

>>> print calendar.month(2002,5) 

May 2002 

Mo Tu We Th Fr Sa Su 

1 2 3 4 5 

6 7 8 9 10 11 12 

13 14 15 16 17 18 19 

20 21 22 23 24 25 26 

27 28 29 30 31 
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>>> 

>>> 

# 2 rows per week; 3 

print calendar.month 
May 2002 

cois per day 

(2002,5,3,2) 

Mon 

T ue 

Wed 

Thu 

Fri 

Sat 

Sun 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 




The function prrrionth(yearnurri,monthnum[,width[,linesperweek]]) prints the 
corresponding output of month. 


calendar(yeamum[,width[,linesperweek[,columnpadding]]]) 

The function calendar prints a yearly calendar, with three months per row. The 
parameters width and linesperweek function as for month. The parameter column- 
padding indicates how many spaces to add between month-columns; it defaults to 
6. The function prcalendar prints the corresponding output of calendar. 

Calendar Information 

The weekday function looks up the day of the week for a particular date. The syntax 
is weekdayCyear, month, day). Weekdays range from Monday (0) to Sunday (6). 
Constants for each day (in all-caps) are available, for convenience and code-clarity: 

>>> # Is May 1, 2002 a Wednesday? 

>>> calendar.weekday(2002,5,1)==calendar.WEDNESDAY 
1 

The function month range (yearnum,monthnum) returns a two-tuple: The weekday 
of the first day of month monthnum in year yearnum, and the length of the month. 

>>> calendar.monthrange(2000,2) # 2000 was a leap year! 

(1, 29) 

By default, calendar starts its weeks on Monday, and ends them on Sunday. I like 
this setting best, because the week ends with the weekend. But you can start your 
calendar’s weeks on another day by calling setfi rstweekday (weekday). The func¬ 
tion f i rstweekday telis you which day of the week is currently the first day of the 
week: 
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>>> calendar.setfirstweekday(calendar.WEDNESDAY ) 

>>> print cal endar. tnonth (2002,5) 

May 2002 

We Th Fr Sa Su Mo Tu 

1 2 3 4 5 6 7 

8 9 10 11 12 13 14 

15 16 17 18 19 20 21 

22 23 24 25 26 27 28 

29 30 31 

>>> calendar.firstweekday() # Weeks start with day #2 (Wed.) 
2 


Leap years 

The function isleap(yearnurri) returns true if yearyearnum is a leap year. The 
function leapdays(firstyear,lastyear) returns the number of leap days from 
firstyear to lastyear, inclusive. 


Using Time Zones 

The value time. dayl i ght indicates whether a local DST (Daylight Savings Time) 
time zone is defined. A value of 1 indicates that a DST time zone is available. 

The value time. ti me zone is the offset, in seconds, from the local time zone to 
UTC. This makes it easy to convert between time zones. The value time. al tzone 
is an offset from the local DST time zone to UTC. The offset al tzone is more accu¬ 
rate, but it is available only if t i me. d ay 1 i g h t is 1. 

>>> Now=time.time() 

>>> time.ctime(Now) # Time in Mountain time zone, USA 
'Sun Dec 10 10:44:49 2000' 

>>> time.ctime(Now+ti me. al tzone) # Time in England 
'Sun Dec 10 17:44:49 2000' 

The value time. tzname is a tuple. The first entry is the name of the local time 
zone. The second entry, if available, is the name of the local Daylight Savings Time 
time zone. The second entry is available only if time. dayl i ght is nonzero. For 
example: 

>>> time.tzname 

('Pacific Standard Time', 'Pacific Daylight Time') 
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Allowing Two-Digit Years 

Two-digit dates are convenient, but they can be ambiguous. For example, the year 
“97” should precede the year “03” if the years are 1997 and 2003, but not if they are 
1997 and 1903. 

In 1999, programmers around the world began rooting through legacy code to solve 
the Y2K Bug —a blanket term for ali bugs caused by indiscriminate use of two-digit 
years. Some people worried that the Y2K Bug would cause The End Of The World 
As We Know It on January 1, 2000. Fortunately, it didn’t and we can ali sleep safely 
at night — at least until 2038 when epoch-based time starts to overflow. 

Normally, Python adds 2000 to a two-digit year from 00 to 68, and adds 1900 to two- 

digit years from 69 to 99. However, for paranoia’s sake, the value 

time. accept2dyear can be set to 0; this setting causes all two-digit years to be 

rejected. If you set the environment variable PYTH0N2K, the value 

time. accept2dyear is initialized to 0. For example: 

»> Y4=(2000, 12, 10, 10, 9, 41, 6, 345, 0) 

»> Y2=(00, 12, 10, 10, 9, 41, 6, 345, 0) # Same date 
>>> time.mktime(Y4) 

976471781.0 

>>> time.mktime(Y2) # 2-digit year below 69; add 2000 
976471781.0 

>>> time.accept2dyear=0 # Zero tolerance for YY! 

>>> time.mktime(Y2) 

Traceback (most recent call last): 

File "<stdin>", line 1, in ? 

ValueError: year >= 1900 required 


Summary 

Python includes Standard libraries for telling time, doing date arithmetic, and con- 
verting between time zones. In this chapter, you: 

Converted time between tuple and ticks representations. 

4 Formatted and parsed times in human-readable formats. 

4- Checked months and days on a yearly calendar. 

4- Handled various time zones, as well as Daylight Savings Time. 

In the next chapter you will learn how to use Python to store and retrieve data from 
databases. 



Using Databases 




J 


D atabases support permanent storage of large amounts 
of data. You can easily perform CRUD (Create, Read, 
Update, and Delete) on database records. Relational 
databases divide data between tables and support sophisti- 
cated SQL operations. 

Python’s Standard libraries include a simple disk-dictionary 
database. The Python DB API provides a Standard way to 
access relational databases. Various third-party modules 
implement this API, providing easy access to many flavors of 
database, Including Oracle and MySQL. 


Using Disk-Based Dictionaries 

Python’s Standard libraries provide a simple database that 
takes the form of a single disk-based dictionary (or disktionary). 
This functionality is based on the UNIX utility dbm — on UNIX, 
you can access databases created by the dbm utility. Several 
modules define such a database, as shown in Table 14-1. 


Table 14-1 

Disk-Based Dictionary Modules 


Module 

Description 

anydbm 

Portable database; chooses the best 
module from among the others 

dumbdbm 

SIow and limited, but available on all 
platforms 

dbm 

Wraps the UNIX dbm utility; available on 
UNIX oniy 

gdbm 

Wraps GNU's improved dbm; available 
on UNIX onIy 

dbhash 

Wraps the BSD database library; 
available on UNIX and Windows 


> ♦ ♦ ♦ 

In This Chapter 

Using disk-based 
dictionaries 

DBM example: 
tracking telephone 
numbers 

Advanced disk-based 
dictionaries 

Accessing relational 
databases 

Example: "sounds- 
like" queries 

Examining relational 
metadata 

Example: creating 
auditing tables 

Advanced features of 
the DB API 
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In general, it is recommended that you use anydbtn, as it is available on any plat- 
form (even if it has to use dumbdbm!) 

Each dbm module detines a dbm object and an exception named error. The fea- 
tures in this section are available from every flavor of dbm; the “Advanced Disk- 
Based Dictionaries” section describes extended features not available in dumbdbm. 

The open function creates a new dbm object. The function’s syntax is open 
(filename[,flag[,mode]]). The filename parameter is the path to the file used 
to store the data. The flag parameter is normally optional, but is required for 
d b h a s h . It has the following legal values: 

r [default] Opens the database for read-only access 

w Opens the database for read and write access 
c Same as w, but creates the database file if necessary 

n Same as w, but always creates a new, empty database file 

Note The flag parameter is required for dbhash.open. 


Caution Some flavors of dbm (including dumbdbm) permit modifications to a database 
opened read-only! 

The optional parameter mode specifies the UNIX-style permissions to set on the 
database file. 


Once you have opened a database, you can access it much like a Standard dictionary: 

>>> Simpl eDB=anydbm.open ("test"," c") # create a new datafile 
>>> SimpleDB["Terry"] = "Gi 11 iam" # add a record 
>>> SimpleDB["John"]="Cleese" 

>>> print SimpleDB["Terry "] # access a record 
GiIliam 

>>> dei SimpleDB["John" ] # delete a record 


The keys and values in a dbm must all be strings. For example: 


>>> SimpleDB["Eric"]=5 
Traceback (most recent 
File "<stdin>", line 
TypeError: bsddb value 


# illegal; value is not a string! 

cal1 1ast): 

1, in ? 

type must be string 


Attempting to access a key with no value raises a KeyError exception. You can use 
the has_key method to verify that a key exists, or call keys to get a list of keys. 
However, the safe get method from a dictionary is not available: 
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>>> SimpleDB.keys() 

[ 'Terry'] 

>>> Simpl eDB.has_key("Eric") 

0 

When you are finished with a dbm object, call its cl ose method to sync it to disk 
and free its used resources. 


DBM Example: Tracking Telephone Numbers 

The example shown in Listing 14-1 uses a dbm object to track telephone numbers. 
The dlctlonary key is a person’s name; the value is his or her telephone number. 


Listing 14-1 : Phone list 


import anydbm 
import sys 

def AddName(DB): 

print "Enter a name. (Null name to cancel)" 

# Take the [:-l] slice to remove the \n at the end 
NewName=sys.stdin.readline()[:-l] 

if (NewName==""): return 
print "Enter a phone number." 

PhoneNumber=sys.stdin.readline()[:-l] 
DB[NewName]=PhoneNumber # Poke value into database! 

def Printtist(DB): 

# Note: A large database may have MANY keys (too many to 

# casually put into memory). See Listing 14-2 for a better 

# way to iterate over keys in dbhash. 

for Key in DB.keys(): 
print Key,DB[Key] 

if (_name_=="_main_"): 

PhoneDB= dbhash.open("phone", "c") 
w h i 1 e (1): 

print "\nEnter a name to look up\n+ to add a name" 
print "* for a full listing\n. to exit" 

Command=sys.stdin.readline()[:-l] 
if (Command==""): 

continue # Nothing to do; prompt again 

if (Command=="-r"): 

AddName(PhoneDB) 
elif (Command=="*"): 

PrintList(PhoneDB) 


Continued 
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Listing 14-1 (continued) 


el i f (Conirriand=="."): 

break # quit! 
el se: 
try: 

print PhoneDBCCotntnand] 
except KeyError: 

print "Name not found." 
print "Saving and closing..." 
PhoneDB.close() 


Advanced Disk-Based Dictionaries 

The various flavors of dbm don’t use compatible file formats — for example, a 
database created using dbhash cannot be read using gdbm. This means that the 
only database file-format available on all platforms is that used by dumbdbm. The 
whi chdb module can examine a database to determine wbich f lavor of dbm created 
it. The function whichdb.whichdb(filenarrie) returns the name of the module that 
created the datafile filename, returns None if the file is unreadable or does not exist, 
and returns an empty strlng if it can’t figure out the flle’s format. For example, the 
following code uses anydbm to create a database, and then querles tbe database to 
see wbat type it really is: 

>>> MysteryDB=anydbtTi.open("Unknown","c") 

>>> MysteryDB.close() # write file so we can check its db-type 
>>> whichdb.whichdbCUnknown") 

'dbhash' 

dbm 

The dbm module provides an extra string variable, 1 i brary, whicb is the name of 
the underlying ndbm implementation. 

gdbm 

The gdbm module provides improved key navigation. Tbe dbm method fi rstkey 
returns tbe first key in tbe database; the method nextkey (currentkey ) returns 
the key after currentkey. After doing many deletions from a gdbm database, you can 
call reorganizeto free up space used by the datafile. In additlon, the method sync 
flushes any unwritten changes to disk. 
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dbhash 

The dbhash module also provides key navigation. The dbm methods f i rst and 
1 ast return the first and last keys, respectively. The methods next( currentkey) 
and previous( currentkey) return the key before and after currentkey, respec¬ 
tively. In addition, the method sync flushes any unwritten changes to disk. 

Databases can be very large, so accessing the list of all keys returned by the keys 
method of a database may eat a lot of memory. The key-navigation methods pro- 
vided by gdbtn and dbhash enable you to iterate over all keys without loading them 
all into memory. The code in Listing 14-2 is an improved replacement for the 
PrintList method in the previous telephone list example. 


Listing 14-2: Improved list iteration with dbhash 


def PrintList(DB): 

Record=None 
try: 

firstO raises a KeyError if there are no entries 

Record = DB.first() 
except KeyError: 

return # Zero entries 
w h i 1 e 1: 

print Record 
try: 

# nextO raises a KeyError if no next entry 

Record = DB.next() 
except KeyError: 

return # al1 done! 


Using BSD database objects 

The bsddb module, available on UNIX and Windows, provides access to the 
Berkeley DB library. It provides hashtable, b-tree, and record objects for data stor- 
age. The three constructors— hashopen, btopen, and rnopen —take the same 
parameters (filename, flag, and mode) as the dbm constructor. The constructors 
take other optional parameters — they are passed directly to the underlying BSD 
code, and should generally not be used. 

BSD data objects provide the same functionality as dbm objects, as well as some 
additional methods. The methods first, last, next, and previous navigate through 
(and return) the records in the database. The records are ordered by key value for a 
b-tree object; record order is undefined for a hashtable or record. In addition, the 
method set_l ocation(keyvalue) jumps to the record with key keyvalue: 
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>>> bob=bsddb . btopen("natnes", "c") 

>>> bob["M"]="Martin" 

>>> bob["E"]="Eric" 

>>> bob["X"]="Xavier" 

>>> bob.first() # E is first, since this is a b-tree 
CE', 'Eric') 

>>> bob.next() 

(’M', 'Martin') 

>>> bob.next() 

('X', 'Xavi er ' ) 

>>> bob.next() # navigating "off the edge" raises KeyError 

Traceback (most recent call last): 

File "<stdin>", line 1, in ? 

KeyError 

>>> bob.set_location("M") 

('M', 'Martin') 


The sync method of a BSD database object flushes any changes to the datafile. 


Accessing Relational Databases 

Relational databases are a powerful, flexible way to store and retrieve many kinds 
of data. There are many relational database implementations, which vary in scala- 
bility and richness of features. The Standard libraries do not include relational 
database support; however, Python modules exlst to access almost any relational 
database, Including Oracle, MySQL, DB/2, and Sybase. 

The Python Database API defines a Standard interface for Python modules that 
access a relational database. Most third-party database modules conform to the API 
closely, though not perfectly. This chapter covers Version 2.0 of the API. 


Connection objects 

The connect method constructs a database connection. The connection is used in 
constructing cursors. When finished wlth a connection, call Its close method to free 
it. Databases generally provide a limited pool of connections, so a program should 
not needlessly use them up. 

The parameters of the connect method vary by module, but typically include dsn 
(data source name), user, password, host, and database. 

Transactions 

Connections oversee transactions. A transaction is a collection of actions that must 
execute atomically—completely, or not at ali. For example, a bank transfer might 
debit one account and credit another; this should be done within a single transac¬ 
tion, as performing only one half of the transfer would obviously be unacceptable. 
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Calling the cotntni t connection method completes the current transaction; calling 
rol 1 back cancels the current transaction. Not all databases support transactions — 
for example, Oracle does, MySQL doesn’t (yet). The commi t method is always avail- 
able; rol 1 back is only available where transaction support is provided. 

Cursor objects 

A cursor can execute SQL statements and retrieve data. The connection method 
cursor creates and returns anew cursor. The cursor method execute(command 
[.parameters]) executes the specified SQL statement command, passing any 
necessary parameters. After executing a command that affects row data, the cursor 
attribute rowcount indicates the number of rows altered or returned; and the 
deseri pt ion attribute (described in the “Examining Relational Metadata” section) 
describes the columns affected. After executing a command that selects data, the 
method fetehone returns the next row of data (as a sequence, with one entry for 
each column value). The method fetchmany([size]) returns a sequence of 
rows — up to size of them. The method fete ha 11 returns all the rows. 

After using a cursor, call its cl ose method to free it. Databases typically have a 
limited pool of available cursors, so it is important to free cursors after use. 


Example: "Sounds-Like" Queries 

The example shown in Listing 14-3 uses the mxQDBC module to look up people 
whose names “sound like” another name. QDBC is a Standard interface for rela¬ 
tional databases; QDBC drivers are available for many databases, including Oracle 
and MySQL. Therefore, the mxODBC module can handle most of the databases you 
are likely to deal with. Listing 14-4 shows the output from the example. 


Listing 14-3: Soundex.py 


# Repi ace this import with the appropriate one for your system: 

import ODBC.Windows 


# Dictionary used for soundslike coding 


SoundexDict = { 


B" 

C" 

K" 

D" 

L" 

M" 

R" 

A" 

H" 


" 1 " 

" 2 " 

" 2 " 

"3" 

■4" 

"5" 

" 6 " 

"7" 


„p„ 

"S" 

"Q" 

"T" 


"1","F" 
"2","G" 
" 2"," X" 
"3", 


" 1","V’ 
"2", "J’ 
" 2"," Z ’ 


’ N":" 5", 

"E" : "7" , "I" : "7", ”0’ 
"W":"8"j 


’7" , "U" : "7", "Y":”7’ 


Continued 
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Listing 14-3 (continued) 


# These SQL statements may need to be tweaked for your database 

# (They work with MySQL) 

CREATE_EMPLOYEE_SQL = .CREATE TABLE EMPLOYEE ( 

EMPLOYEE_ID INT NOT NULL, 

FIRST_NAME VARCHAR(20) NOT NULL, 

LAST_NAME VARCHAR(20) NOT NULL, 

MANAGER_ID INT 
). 

DROP_EMPLOYEE_SQL="DROP TABLE EMPLOYEE" 

INSERT_SQL = "INSERT INTO EMPLOYEE VALUES " 

def SoundexEncoding(str): 

.Return the 4-character SOUNDEX code for a string. Take 

first letter, then encode subsequent consonants as numbers. 
Ignore repeated codes (e.g MM codes as 5, not 55), unless 

separated by a vowel (e.g. SOS codes as 22). 

if (str==None or str==""): return None 
str = str.upperC) # ignore case! 

SoundexCode=str[0] 

LastCode=SoundexDict[str[0]] 
for char in str[l: ]: 

CurrentCode=SoundexDictCchar] 
if (CurrentCode=="8"): 

pass # Don't include, or separate used consonants 

elif (CurrentCode=="7"): 

LastCode=None # Include consonants after vowels 
elif (CurrentCodel = LastCode): # Skip doubled letters 
SoundexCode+=Currenteode 

if 1 en(SoundexCode )==4 : break # limit to 4 characters 
# Pad with zeroes (e.g. Lee is LOGO): 

SoundexCode += "0"*(4-len(SoundexCode)) 
return SoundexCode 

# Create the EMPLOYEE table 

def CreateTable(Conn): 

NewCursor=Conn.cursori) 
try: 

NewCursor.exeeute(DROP_EMPLOYEE_SQL) 

NewCursor.exeeute(CREATE_EMPLOYEE_SQL) 
f i n a 11 y : 

NewCursor.close() 

# insert a new employee into the table 

def CreateEmployee(Conn,DataValues): 

NewCursor=Conn.cursori) 
try: 

NewCursor.exeeute(INSERT_SQL+DataVal ues) 
f i n a 11 y : 

NewCursor.close() 

# Do a soundslike query on a name 

def PrintUsersLike(Conn,Name): 
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if (Nanie==None or Natne==""): return 

print "Users with last natne sitnilar to",Natne+":" 

SoundexNatne = SoundexEncodi ng (Natne) 

QuerySQL = "SELECT EMPLOYEE_ID, FIRST_NAME, LAST_NAME FROM" 
QuerySQL+= " EMPLOYEE WHERE LAST_NAME LIKE "'+Natne[0]+"%"’ 

NewCursor=Conn.cursor() 
try: 

NewCursor.execute(QuerySQL) 

for EmployeeRow in NewCursor.fetchal1(): 

if (SoundexEncoding(EtnployeeRow[2])==SoundexName): 
print EmployeeRow 

f i n a 11 y : 

NewCursor.close() 
if (_name_=="_main_"): 

pass clear_auto_commit=0, because MySQL doesn't support 

# transactions (yet) and can't handle autocommit flag 

# Repi ace "MyDB" with your datasource name! 

Conn=QDBC.Wi ndows.Connecti"MyDB",cl ear_auto_commit=0) 
CreateTable(Conn) 

CreateEmpl oyee(Conn,'(1,"Bob","Hi 1bert", Nui 1 )' ) 
CreateEmployee(Conn,'(2,"Sarah","Pfizer",Null)') 

CreateEmployee(Conn,'(3,"Sandy","Lee",1)') 

CreateEmployee(Conn,'(4,"Pat","Labor",2)') 

CreateEmployee(Conn,'(5,"Larry","Hei per",Nui 1)') 
PrintUsersLil<e(Conn,"Heilbronn") 

PrintUsersLikelConn,"Pfizer") 

PrintUsersLike(Conn,"Washington") 
PrintUsersLike(Conn,"Lieber") 


Listing 14-4: Soundex output 


Users with last name similar to Heilbronn: 
(1.0, ' Bob’, ’ Hi 1bert') 

(5.0, ’Larry' , 'Hei per ’ ) 

Users with last name similar to Pfizer: 
(2.0, 'Sarah ' , 'Pfizer' ) 

Users with last name similar to Washington: 
Users with last name similar to Lieber: 
(4.0, 'Pat ’ , 'Labor ' ) 


Examining Relational Metadata 

When a cursor returns data, the cursor attribute deseri pti on is metadata— 
definitions of the colurnus involved. A column’s definition is represented as a 
seven-item sequence; deseri pti on is a sequence of such definitions. The items in 
the sequence are listed in Table 14-2. 
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Table 14-2 

Metadata Sequence Pieces 

Index 

Data 

0 

Column name 

1 

Type code 

2 

Display size (in columns) 

3 

Internal size (in characters or bytes) 

4 

Numeric scale 

5 

Numeric precision 

6 

Nullable (if 0, no nulls are allowed) 


For example, the following is metadata from the Employee table of the Soundex 
example: 

>>> mc.executeCselect FIRST_NAME, MANAGER_ID from EMPLOYEE") 
>>> mc.deseription 

(('FIRST_NAME', 12, None, None, 5, 0, 0), ('MANAGER_ID', 3, 
None, None, 1, 0, 1)) 

Note The mxODBC module does not return display size and internal size. 


Example: Creating Auditing Tables 

Sometimes, it is useful to view old versions of data. For example, you may want to 
know both someone’s current address and his or her old address. Or, a medical 
database may track who changed a patienfs record, and when. One way to capture 
this data is with a mirror table —whenever an i nsert or update or dei ete occurs 
in the maln table, a corresponding row is written to the mirror table. The mirror 
rows contain data, a timestamp, and the ID of the editing user — therefore, they 
provide a full audit trail of ali data changes. Ideally, mirror rows should be inserted 
in the same transaction as the data-manipulation, to ensure that the audit trail is 
accurate. 

The script shown in Listing 14-5 uses metadata to write SQL that creates a mirror 
table for a data table. Listing 14-6 shows a sample of the script’s output. 
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Listing 14-5: MirrorMaker.py 


import ODBC.Windows 

. MirrorMaker builds mirror tables, for purposes of auditing. 

For a table TABLEX, we create SQL to add a mirror table 
TABLEX_M. The mirror table tracks version numbers, update 
times, and updating users. . 

# Repi ace these constants wi th values for your database 
SERVER_NAME = "MyDB" 

USER_NAME = "eva" 

PASSWORD = "destruction" 

SAMPLE_TABLE = "EMPLOYEE" 

# Metadata for the mirror table's special coiumns 

VERSION_NUMBER_COLUMN=("VERSION_NUMBER", 

ODBC.Windows.NUMERIC,None,None,0,0,0) 

LAST_UPDATE_COLUMN=(" LASTJPDATE" , 

0DBC.Windows.TIMESTAMP,None,None,0,0,0) 
UPDATE_USER_COLUMN=("UPDATE_USER_ID", 

ODBC.Windows.NUMERIC,None,None,0,0,0) 

def CreateCol umnDefSQL(CoiumnTuple): 

ColumnSQL = CoiumnTuple[0] #name 
ColumnSQL += " " 

# The mxODBC function sqltype returns the SQL name of a 

# (numeric) column type. (For a different database 

# module, you may need to code this transiation yourself.) 

OracleColumnType = ODBC.Windows.sqltype[ColumnTuple[l]] 
ColumnSQL += OracleColumnType 

# width of character fi el ds 

if (OracleColumnType == "VARCHAR2" or 
OracleColumnType == "VARCHAR"): 

# Internal size not returned by mxODBC; so, use scale 

ColumnSQL += CoiumnTuple[4]'+")" # width 

if (OracleColumnType == "NUMBER"): 

if (CoiumnTuple[4]): # preci sion+scale 
ColumnSQL += "(" + 'CoiumnTuple[4]' + 

CoiumnTuple[5]'+")" # 
if (CoiumnTuple[6]): # nullable 
ColumnSQL += " NULL" 
el se: 

ColumnSQL += " NOT NULL" 
return ColumnSQL 

def CreateMirrorTableDefSQL(MyConnecti on , Tabi eName): 

MyCursor = MyConnection.cursor() 

# This query returns no rows (because 1!=2), but returns 

# metadata (the definitions of each column in the table). 


Continued 
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Listing 14-5 (continued) 


# Analogous to the SQL command "describe TABLENAME". 

MyCursor. executeC "SELECT * from "+Tabl eNarrie+" where 1=2"); 
SQLString = "CREATE TABLE "+TableName+"_M (" 

# Loop through columns, and create DDL for each 

Fi rstCol unin = l 

for Colutnninfo in MyCursor. deseri pti on : 
if (FirstColumn!=1): 

SQLString=SQLString+"," 

Fi rstCol urrin=0 

SQLString += "\n"+CreateCol utnnDef SQL( Coi umn Info) 

# Add SQL to create the special mirrortable columns 

SQLString += ", \n" + 

CreateColumnDefSQL(VERSION_NUMBER_COLUMN) 

SQLString += ", \n" + 

CreateColumnDefSQL(LAST_UPDATE_COLUMN) 

SQLString += ", \n" + 

CreateColumnDefSQLCUPDATE_USER_COLUMN) 

SQLString += " \n) \n" 

MyCursor.close() 
return SQLString 

if (_name_=="_main_"): 

MyConnection = 

ODBC.Windows.ConnecteSERVER_NAME,USER_NAME,PASSWORD) 
print CreateMirrorTabi eDefSQL(MyConnection,SAMPLE_TABLE) 


Listing 14-6: MirrorMaker output 


CREATE TABLE EMPLOYEE_M ( 
EMPLOYEE_ID DECIMAL NOT NULL, 
FIRST_NAME VARCHAR(O) NOT NULL, 
LAST_NAME VARCHAR(O) NOT NULL, 
MANAGER_ID DECIMAL NULL, 
VERSION_NUMBER NUMERIC NOT NULL, 
LAST_UPDATE TIMESTAMP NOT NULL, 
UPDATE_USER_ID NUMERIC NOT NULL 
) 


Advanced Features of the DB API 

Relational databases feature various column types, such as INT and VARCHAR. A 
database module should export constants describing these datatypes; these con- 
stants are used in deseri pti on metadata. For example, the following code checks 
a column type (12) against a module-level constant (VARCHAR); 
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>>> MyCursor.executeC"SELECT EMPLOYEE_NAME from EMPLOYEE where 
FIRST_NAME='Bob"’) 

>>> MyCursor.deseription[0] 

('FIRST_NAME', 12, None, None, 3, 0, 0) 

>>> MyCursor.deseription[0][1]==0DBC.Windows.VARCHAR 
1 

Some column types, such as dates, demand a particular kind of data. A database 
module should export functions to construet date, time, and timestamp values. For 
example, the function Date(year,tTionth,day) constructs a date value (suitable for 
insertion into the database) corresponding to the given year, month, and day The 
module mxDateTime provides the preferred implementation of date and time objects. 

Input and output sizes 

The cursor attribute arraysi ze specifies how many rows, by default, to return in 
each call to fetchmany. It defaults to 1, but you can increase it if desired. Manipulating 
arraysi ze is more efficient than passing a size parameter to fetchmany: 

>>> MyCursor.executeC"SELECT FIRST_NAME FROM EMPLOYEE") 

>>> MyCursor.rowcount # total fetchable rows 
5 

>>> MyCursor.fetchmany() # default arraysize is 1 
[('Bob' ,)] 

>>> MyCursor.arraysize=5 # get up to 5 rows at once 

>>> MyCursor.fetchmany() # (only 4 left, so I don't get 5) 

[('Sarah',), ('Sandy',), ('Pat',), ('Larry',)] 

The cursor methods setinputsizes(size) and setoutputsize(size 
[, coi umni ndex]) let you set an “expected size” for columns before exeeuting a 
SQL statement. These methods are optional, and exist to improve performance and 
memory usage. 

The size parameter for seti nputsi zes is a sequence. Each entry in size should 
specify the maximum length for each parameter. If an entry in size is None, then no 
block of memory will be set aside for the corresponding parameter value (this is 
the default behavior). 

The method setoutputsize sets a maximum buffer size for data read from large 
columns (LONG or BLOB). If columnindex is not specified, the buffer size is set for 
all large columns in the resuit sequence. For example, the following code limits the 
data read from the long DESCRIPTION column to 50 characters: 

>>> MyCursor.setoutputsizes(l,50) 

>>> MyCursor.executeC"sel ect GAME_NAME, DESCRIPTION from GAME") 
>>> MyCursor.fetehone() 

('005', ' You play a spy who must take a briefease and suc') 
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Reusable SQL statements 

Before a SQL statement can be executed, it must be parsed. Vendors such as Oracle 
cache recently parsed SQL commands so that the commands need not be re-parsed 
if they are used again. Therefore, you should build re-usable SQL statements with 
marked parameters, instead of hard-coded values. This way, the parameters can be 
passed into the execute method. The following example re-uses the same SQL 
statement to query a video game database twlce: 

>>> SQLQuery = "select GAME_NAME from GAME where GAME_ID = ?" 

>>> MyCursor.executeCSQLQuery,(60,)) # tuple provides ID of 60 
>>> MyCursor.fetchal1 () 

[('Air Combat 22' ,) ] 

>>> MyCursor.execute(SQLQuery,(200,)) # no need to re-parse SQL 
>>> MC.fetchal1() 

[('Badlands ' , ) ] 

The syntax for parameter marking is described by the module variable pa ramsty 1 e 
(see the next section, “Database library Information”). The cursor method 
executemany ( command,parametersequence ) runs the same SQL statement 
command many times, once for each collection of parameters in parametersequence. 

DataBase library information 

The module variable api 1 e vel is a string describing the supported DB API level. It 
should be either 1.0 or 2.0; if it is not available, assume the supported API level is 1.0. 

The module variable threadsafety describes what level of concurrent access the 
module supports: 

0 Threads may not share the module 

1 Threads may share the module 

2 Threads may share connections 

3 Threads may share cursors 

The module variable paramstyle describes which style of parameter marking the 
module expects to see in SQL statements. Following are the legal values of param¬ 
style and an example of such a marked parameter: 


qmark 

WHERE NAME=? 

numeric 

WHERE NAME=.l 

named 

WHERE NAME=.name 

format 

WHERE NAME=%s 

pyformat 

WHERE NAME=%(name)s 
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Error hierarchy 

Database warnings and errors are subclasses of the class StandardError from tbe 
module excepti ons. You can catch tbe Error class to do general error handling, or 
catch more specific exceptions. Figure 14-1 shows the inheritance hierarchy of 
database exceptions. See Table 14-3 for a description of each exception. 


Database Exceptions 



Figure 14-1: Database exception class hierarchy 



Table 14-3 

Database Exceptions 

Type 

Meaning 

Warning 

Significant warnings, such as data-value truncation during insertion. 

Error 

Base class for other errors. Not raised directiy. 

InterfaceError 

Raised when the database module encounters an internal error. 

An InterfaceError stems from the database module, not the 
database itseif. 

DatabaseError 

Errors relating to the database itseif. Mostly used as a base class 
for other errors. 

DataError 

Errors due to invalid data, such as an out-of-range numeric value. 


Continued 
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Table 14-3 (continued) 

Type 

Meaning 

OperationalError 

Operational errors, such as a failure to connect to the database. 

IntegrityError 

Data integrity errors, such as a missing foreign key. 

InternalError 

Internal database error, such as a cursor becoming disconnected. 

ProgrammingError 

Invalid call to the database module; for example, trying to use a 
cursor that has been closed, or calling f etch on a cursor before 
executing a command that returns data. 

NotSupportedError 

Some portions of the DB API are optional. A module that does 
not implement optional methods may raise NotSupportedError if 
you attempt to call them. 


Summary 

Python’s Standard libraries include powerful tools for handling dictionaries on disk. 
Modules implementing the Python Database API permit easy access to relational 
databases. In this chapter, you: 

Learned about Python’s flavors of dbm. 

-f Stored and retrieved dictionary data on disk. 

-f Looked up employees with a “sounds-like” query. 

-f Used table metadata to easily build new relational tables. 

In the next chapter, you learn how to harness Python for networking. 
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Networking 



T he modules covered in this chapter teach you everything 
you need to know to communicate between programs on 
a network. The networking topics covered here don’t require 
more than one computer, however; you can use networking 
for interprocess communication on a single machine. 


Networking Background 

This section provides a hrief introduction to some of the 
terms you’ll encounter in the rest of this chapter. 
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In This Chapter 

Networking 

background 

Working with 
oddresses and host 
nomes 

Communicoting with 
low-level sockets 


A Socket is a network connection endpoint. When your Weh 
hrowser requests the main Web page of www . py t h o n . o r g , for 
example, your Web hrowser creates a socket and instructs it 
to connect to the Web server hosting the Python Web site, 
where the Web server is also listening on a socket for incom- 
ing requests. The two sides use the sockets to send messages 
and other data back and forth. 

When in use, each socket is bound to a particular IP address 
and port. An IP address is a sequence of four numbers in the 
range of 0 to 255 (for example, 173.15.20.201); port numbers 
range from 0 to 65535. Port numbers less than 1024 are 
reserved for well-known networking Services (a Web server, for 
example, uses port 80); the maximum reserved value is stored 
in the socket module’s I PPORT_RESERVED variable. You can 
use other port numbers for your own programs, although tech- 
nically, ports 1024 to 5000 (socket. IPPORTJSERRESERVED) 
are used for officially registered applications (although nobody 
will yell at you for using them). 

Not ali IP addresses are visible to the rest of the world. Some, 
in fact, are specifically reserved for addresses that are never 
public (such as addresses of the form 192.168.y.z or lO.x.y.z). 
The address 127.0.0.1 is the /oco//?ost address; it always refers 
to the current computer. Programs can use this address to 
connect to other programs running on the same machine. 


Example: a multicast 
chat applicatiori 

Using SocketServers 

Processing Web 
browser requests 

Hondling multiple 
requests without 
threods 

♦ ♦ ♦ ♦ 
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Remembering more than a handful of IP addresses can be tedious, so you can also 
pay a small fee and register a host name or domain name for a particular address 
(not surprisingly, more people visit your Web site if they can point their Web 
browser at www.threemeat.com instead of 208.114.27.12). Domain Name Servers 
(DNS) handle the task of mapping the names to the IP addresses. Every computer 
can have a host name, even if it isn’t an officially registered one. 

Exactly how messages are transmitted through a network is based on many factors, 
one of which is the different protocols that are in use. Many protocols build upon 
simpler, lower-level protocols to form a protocol stack. HTTP, for example, is the 
protocol used to communicate between Web browsers and Web servers, and it is 
built upon the TCP protocol, which is in turn built upon a protocol named IP. 

When sending messages between two programs of your own, you usually choose 
between the TCP and UDP protocols. TCP creates a persistent connection between 
two endpoints, and the messages that you send are guaranteed to arrive at their 
destination and to arrive in order. UDP is connectionless, a bit faster, but less reli- 
able. Messages you send may or may not make it to the other end; and if they do 
make it, they might arrive out of order. Occasionally, more than one copy of a 
message makes it to the receiver, even if you sent it only once. 

You can find volumes full of additional information on networking; this section 
doesnT even scratch the surface. It does, however, give you a head start on under- 
standing the following sections. 


Working with Addresses and Host Names 

The Socket module provides several functions for working with host names and 
addresses. 

*Note The Socket module is a very close wrapper around the C socket library; and like 
the C version, it supports all sorts of options. This chapter covers the most 
common and usefui features of sockets; consuit the Winsock help file or the 
UNIX Socket man pages for coverage of more arcane features. In many cases, the 
Socket module defines variables that map directiy to the C equivalent (for 
example, socket. I P_MAX_MEMBERSHI PS is equivalent to the C constant of the 
same name). 

gethostname () returns the host name for the computer on which the program is 
running: 


>>> import socket 

>>> socket.gethostnamef) 

' endor' 
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gethostbynatne(nanie) tries to resolve the given host name to an IP address. First 
a check is made to determine whether the current computer can do the translation. 
If it doesn’t know, a request is sent to a remote DNS server (which in turn may ask 
other DNS servers too). gethostbyname returns the name or raises an exception if 
the lookup fails: 

>>> Socket.gethostbynameC'endor' ) 

'10.0.0.6' 

>>> Socket. gethostbynatneC 'www.python.org' ) 

'132.151.1.90' 

An extended form, gethostbynarrie_ex( name), returns a 3-tuple consisting of the 
primary host name of the given address, a list of alternative host names for the 
same IP address, and a list of other IP addresses for the same interface on that 
same host (both lists may be empty): 

>>> Socket.gethostbynameC 'www.yahoo.com' ) 

'64.58.76.178' 

>>> Socket.gethostbyname_ex('www.yahoo.com') 

('WWW.yahoo .akadns.net', ['www.yahoo.com'], 

['64.58.76.178', '64.58.76.176', '216.32.74.52', 

'216.32.74.50', '64.58.76.179', '216.32.74.53', 

'64.58.76.177', '216.32.74.51', '216.32.74.55']) 

Thegethostbyaddr(address) function does the same thing, except that you 
supply it an IP address string instead of a host name: 

>>> Socket.gethostbyaddr('132.151.1.90') 

('parrot.python.org', ['www.python.org'], ['132.151.1.90']) 

getservbyname(servi ce , protocol ) takes a Service name (such as ‘telnet’ or 
‘ftp’) and a protocol (such as ‘tcp’ or ‘udp’) and returns the port number used by 
that Service: 

>>> Socket.getservbyname('http','tcp' ) 

80 

>>> Socket.getservbyname('telnet', 'tcp' ) 

23 

>>> Socket.getservbyname('doom','udp' ) 

666 # id Software registered this for the game "Doom" 

Often, non-Python programs store and use IP addresses in their 32-bit packed form. 
The i net_aton (i p_addr) and i net_ntoa (packed) functions convert backand 
forth between this form and an IP address string: 

>>> Socket.inet_aton('177.20.1.201' ) 

'\261\024\001\311' # A 4-byte string 
>>> Socket. i net_ntoa('\x7F\x00\x00\x01' ) 

'127.0.0.1' 
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Socket also delines a few variables representing some reserved IP addresses. 
INADDR_ANY and I NADDR_BROADCAST are reserved IP addresses referring to anylP 
address and the broadcast address, respectively; and INADDR_LOOPBACK refers to 
the loopback device, always at address 127.0.0.1. These variables are in the 
numeric 32-bit form. 

The getfqdn ([natne] ) function returns the fully qualified domain name for the given 
hostname (if omitted, it returns the fully qualified domain name of the local host): 

>>> Socket.getfqdn('') 

'dial up84.1 asal.net' 

New getfqdn was new in Python 2.0. 

Feature 


Communicating with Low-Level Sockets 

Although Python provides some wrappers that make using sockets easier (you’ll 
see them later in this chapter), you can always work with sockets directly too. 

Creating and destroying sockets 

The socket( fami ly , type[, proto] ) function in the Socket module creates a 
new Socket object. The fami 1 y is usually AF_INET, although others such as AF_I PX 
are sometimes available, depending on the platform. The type is most often 
SOCK_STREAM (for connection-oriented, reliable TCP connections) or SOCK_DGRAM 
(for connectionless UDP messages): 

>>> from Socket import * 

>>> s = socket(AF_INET,SOCK_STREAM) 

The combination of family and type usually Implles a protocol, but you can specify 
it using the optional third parameter to Socket using values such as I PPR0T0_TCP 
or I PPR0T0_RAW. Instead of using the IPPR0T0_ variables, you can use the 

getprotobyname(proto) function: 

>>> getprotobyname('tcp') 

6 

>>> IPPR0T0_TCP 
6 

fromfd(fd, family, type[, proto] ) is a rarely used function for creating a 
Socket object from an open file descriptor (returned from a file’s f i 1 eno( ) 
method). The descriptor should be connected to a real Socket, and not to a file. Tbe 
fi 1 en 0 () method of a socket object returns the file descriptor (an integer) for this 
Socket. See the section “Handling Multiple Requests Without Threads” later in this 
chapter for an idea of where this might be useful. 
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When you are finished with a Socket object, you call the cl ose( ) method, after 
which no further operation on the object will succeed (sockets are automatically 
closed when they are garbage collected, but it’s a good idea to explicitly close them 
when possible, both to free up resources sooner and to make your program 
clearer). Alternatively, you can use the s hutdown ( how ) method to close one or 
both halves of a connection. Passing a value of 0 prevents the socket from receiving 
any more data, 1 prevents any additional sends, and 2 prevents additional transmis- 
sion in either direction. 

Connecting sockets 

When two sockets connect (via TCP, for example), one side listens for and accepts 
an incoming connection, and the other side initiates that connection. The llstening 
side creates a socket, calls bind(address) to bind it to a particular address and 
port, calls 1 i sten (backl og ) to listen for incoming connections, and finally calls 
acceptf ) to accept the new, incoming connection: 

>>> s = socket(AF_INET,SOCK_STREAM) 

»> s.bind(('127.0.0.1',44444)) 

>>> s.listen(l) 

>>> q,v = s.acceptf) # Returns socket q and address v 

Note that the preceding code will block or appear to hang until a connection is pre- 
sent to be accepted. No problem; just initiate a connection from another Python 
interpreter. The connecting side creates a socket and calls connect(address): 

>>> s = socket(AF_INET,SOCK_STREAM) 

>>> s.connect(('127.0.0.1',44444)) 

At this polnt, the first side of the connection uses socket q to communicate with the 
second side, using socket s. To verlfy that they are connected, enter the following 
line on the first, or server, side: 

>>> q . send( ' Flel 1 0 from Python!') 

18 @code:# Number of bytes sent 

On the other side, enter the following: 

>>> s.recv(1024) # Receive up to 1024 bytes 
' Flel 1 0 from Pythoni ' 

The addresses you pass to bi nd and connect are 2-tuples of ( i pAddress , port ) for 
AF_I NET sockets. Instead of connect, you can also call the connect_ex( address ) 
method. If the underlying call to the C connect returns an error, connect_ex will 
also return an error (or 0 for success), instead of raising an exception. 
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Tip 


When you call 1 i sten, you pass in a number specifying the maximum number of 
incoming connections that will be placed in a wait queue. If more connections 
arrive when the queue is full, the remote side is informed that the connection was 
refused. The SOMAXCONN variable in the Socket module Indicates the maximum size 
the wait queue can be. 

The accepte ) method returns an address of the same form used by bi nd and 
connect, indicating the address of the remote Socket. The following uses the 
V variable from the preceding example: 

>>> V 

('127.0.0.1', 1039) 

UDP sockets are not connection-oriented, but you can stili call connect to 
associate a socket with a given destination address and port (see the next section 
for details). 

Sending and receiving data 

send(string[, flags]) sends the given strlng of bytes to the remote socket. 
sendto(string[, flags], address) sends the given string to a particular 
address. Generally, the send method is used with connection-oriented sockets, and 
sendto is used with non-connectlon-oriented sockets, but if you call connect on a 
UDP socket to associate it with a particular destination, you can then call send 
instead of sendto. 

Both send and sendto return the number of bytes that were actually sent. When 
sending large amounts of data quickly, you may want to ensure that the entire 
message was sent, using a function like the following: 

def safeSend( sock ,rrisg): 
sent = 0 
while msg: 

i = sock.send(msg) 
if i == -1: # Error 
return -1 
sent -i-= i 
msg = msg[i:] 

time. sl eep( 25) # Wait a little while the queue empties 

return sent 

This keeps resending part of the message as needed until the entire message has 
been sent. 

An even better solution to this problem is to avoid sending data until you know at 
least some if it can be written. See "Handiing Multiple Requests Without Threads" 
later in this chapter for details. 
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The recv(bufsize[,flags]) method receives an incoming message. If a lot of data 
is waiting, it returns onlythe first bufsize bytes that are waiting. recvfrom 
(bufsi ze[, fl ags] ) does the same thing, except that with AF_I NET sockets the 
return value is (data, (ipAddress,port))so that you can see from where the 
message originated (this is useful for connectionless sockets). 

The send, sendto, recv, and recvfrom methods all take an optional f 1 ags 
parameter that defaults to 0. You can use a bitwise-OR on any of the Socket . MSG_* 
variables to create a value for fl ags. The values available vary by platform, but 
some of the most common are listed in Table 15-1. 


Table 15-1 

Flag Values for send and recv 

Flag 

Description 

MSG_00B 

Process out-of-band data. 

MSG_D0NTR0UTE 

Don't use routing tables; send directiy to the interface. 

MSG_PEEK 

Return the waiting data without removing it from the queue. 


For example, if you have an open socket that has a message waiting to be received, 
you can take a peek at the message without actually removing it from the queue of 
incoming data: 

>>> q.recv(1024,MSG_PEEK) 

'Hei 1 0 ! ' 

>>> q.recv( 1024 ,MSG_PEEK) # You could call this over and over. 

'Hei 1 0 ! ' 

The makef i 1 e( [mode[, bufsi ze] ]) method returns a file-like object wrapping 
this Socket, so that you can then pass it to code that expects a file argument (or 
maybe you prefer to use file methods instead of send and recv). The optional 
mode and bufsize parameters take the same values as the built-in open function. 

Chapter 8 explains the use of files and filelike objects. 



Using Socket options 

A Socket objecfs getpeername () and getsockname () methods both return a 2- 
tuple containing an IP address and a port Qust as you’d pass to connect or bi nd). 
getpeername returns the address and port of the remote socket to which it is con- 
nected, and getsockname returns the same Information for the local socket. 

By default, sockets are blocking, which means that socket method calls don’t return 
until the action completes. For example, if the outgoing buffer is full and you try to 
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Tip 


send more data, the call to send will try to block until it can put more data into the 
buffer. You can change this behavior by calling the setblocking(flag) method 
with a value of 0. When a Socket is nonblocking, it will raise the error exception if 
the requested actlon would cause it to block One useful applicatlon of this behavior 
is that you can create servers that shut down gracefully: 

s = socl<et(AF_INET,SOCK_STREAM) 
s.bincK ('10.0.0.6' ,55555)) 
s . 1 isten( 5 ) 
s . setblocking(0) 
while bKeepGoing: 
try: 

q,V = s.accepte) 
except error: 

q = None 
if q: 

processRequest(q,v) 
el se: 

time.sleep(0. 25 ) 

This server continuously tries to accept a new connection and send it off to the fic- 
tional processRequest function. If a new connection isn’t available, it sleeps for a 
quarter of a second and tries again. This means that some other part of your pro- 
gram can set the bKeepGoi ng variable to 0, and the preceding loop will exit. 

Another approach is to call selectorpoll on your listen socket to detect when 
a new connection has arrived. See "Handiing Multiple Requests Without Threads" 
later in this chapter for more information. 

Other Socket options can be set and retrieved with the setsockoptdevel, name, 
value) and getsockoptC 1 evel , name[, buf 1 en] ) methods. Sockets represent 
several layers of a protocol stack, and the 1 e vel parameter specifies at what level 
the option should be applied. (For example, the option may pertain to the socket 
itself, an intermediate protocol such as TCP, or a lower protocol such as IP.) The 
values for 1 evel start with S0L_ (S0L_S0CKET, S0L_TCP, and so on). The name of 
the option identifies exactly which option you’re talking about, and the socket 
module defines whatever option names are available on your platform. 

The C version ofsetsockopt requires that you pass in a buffer for the value 
parameter, but in Python you can just pass in a number if that particular option 
expects a numerlc value. You can also pass in a buffer (a string), but lt’s up to you 
to make sure you use the proper format. With getsockopt, not specifying the 
buf 1 en parameter means you’re expecting a numeric value, and thaCs what it 
returns. If you do supply bufl en, getsockopt returns a string representing a 
buffer, and its maximum length will be bufl en bytes. 

Although there’s a ton of options in exlstence, Table 15-2 lists some of the more 
common ones you’11 need, along with what type of data the value parameter is sup- 
posed to be. For example, use the following to set the send buffer size of a socket to 
about 64 KB: 


Chapter 15 -f Networking 255 


>>> s = socl<et(AF_INET,SOCK_STREAM) 

>>> s .setsockoptCSOLSOCKET, S0_SNDBUF, 65535) 

To get the time-to-live (TTL) value or number of hops a packet can make before 
being discarded by a router, use this: 

>>> s.getsockopt(SOL_IP, IP_TTL) 

32 

See the sample chat application in the next section for more examples of using 

setsockopt. 


Table 15-2 

Common setsockopt and getsockopt Options 

Option Name 

Value 

Description 

Options for S0L_S0CKET 

S0_TYPE 

(Get oniy) 

Soekettype (for example, SOCK_STREAM) 

S0_ERR0R 

(Get onIy) 

Soeket's last error 

S0_LINGER 

Boolean 

Linger on cl ose if data present 

S0_RCVBUE 

Number 

Input (reeeive) buffer size 

S0_SNDBUE 

Number 

Output (send) buffer size 

S0_RCVTIME0 

Time struet’ 

Input (reeeive) timeout delay 

S0_SNDTIME0 

Time struet’ 

Output (send) timeout delay 

SO_REUSEADDR 

Boolean 

Enable multiple users of a loeal address/port 

Options for S0L_TCP 



TCP_NODELAY 

Boolean 

Send data immediately instead of waiting for 
minimum send amount 

Options for S0L_IP 



IP_TTL 

0-255 

Maximum number of hops a paeket ean travel 

IP_MULTICAST_TTL 

0-255 

Maximum number of hops a paeket ean travel 

IP_MULTICAST_IE 

inet_aton(ip) 

Seleet interfaee over whieh to transmit 

IP_MULTICAST_LOOP 

Boolean 

Enable sender to reeeive a eopy of multieast 
paekets it sends out 

IP_ADD_MEMBERSHIP 

ip_mreq^ 

Join a multieast group 

IP_DROP_MEMBERSHIP 

ip mreq’’ 

Leave a multieast group 


1 The struet is two C long variables to hold seconds and microseconds. 

2 The struet is the eoneatenation of two ealls to i net_aton -one for multieast address and one for loeal address. 






256 Partili -f Networking and the Internet 


Converting numbers 

Because the byte ordering can vary by platform, a network order specifies a Stan¬ 
dard ordering to use when transferring numbers across a network. The n t h o 1 (x) 
and n 1 0 h s (X) functions take a network number and convert it to the same number 
using the current hosfs byte ordering, and the h t o n 1 (x) and h t o n s (x ) functions 
convert in the other direction (if the current host has the same byte ordering as 
network order, the functions do nothing): 

>>> itnport Socket 

>>> Socket.htons( 20000 ) # Convert a 16-bit value 
8270 

>>> Socket.htonl (20000) # Convert a 32-bit value 
541982720 

>>> Socket.ntohl (541982720) 

20000 


Example: A Multicast Chat Application 

The example in thls section combines material from several chapters to create a 
Chat application that also enables you to draw on a shared whiteboard, as shown in 
Figure 15-1. 



Figure 15-1: The chat/whiteboard application in action 


Instead of using a client/server model, the program uses multicast sockets for its 
communication. When you send a message to a multicast address (those addresses 
in the range from 224.0.0.1 to 239.255.255.255, inclusive), the message is sent to ali 
computers that have joined that particular multicast group. This provides a simple 
way to send messages to any number of other computers, without having to keep 
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track of which computers are listening. (This could also be considered a security 
hole — were this a “real-world” applicatiori, you’d want to encrypt the messages or 
use some other means to prevent eavesdropping.) 

Save the program in Listing 15-1 to a file named tnul ti chat. py. To start the applica- 
tion, speclfy on the command Une your name or alias and your color. The color is 
passed to Tki nter (the module in charge of the user interface), so normal color 
names such as blue or red work, but you can also use any of Tkinter’s niftier colors: 

C:\tetnp> python tnul ti test. py Bob SlateBlue4 

You don’t need several computers to try this program out; just start multiple copies 
and watch them interact. 

j-Cross- ^ This application uses Tki nter for its user interface. To leam more about 
Referenc^ Tki nter, see Chapters 19 and 20. It also uses threads, which you can leam about 
in Chapter 26. Finally, read Chapter 12 to leam about serializing Python objects 
with pi ckl e and cPi ckl e. 


Listing 15-1 : multichat - Multicast chat/ 
whiteboard application 


from Tkinter import * 

from Socket import * 

import cPickle, threading, sys 

# Each message is a command -i- data 

CMD_JOINED,CMD_LEFT,CMD_MSG,CMD_LINE,CMD_JOINRESP = range(5) 
people = {} # key = (ipaddr.port), value = (name,color) 

def sendMsg(msg): 

sendSock.send(msg,0) 

def onQuit(): 

'User clicked Quit button' 

sendMsg(chr(CMD_LEFT)) # Notify others that I'm leaving 
root.quit() 

def onMove(e): 

'Called when LButton is down and mouse moves' 

global 1astLine,mx,my 

canvas.deiete(1astLine) # Erase temp line 
mx,my = e.x, e .y 

# Draw a new temp line 

lastLine = \ 

canvas.create_line(dx,dy,mx,my,width=2,f i 11 ='B1 ack') 


Continued 
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Listing 15-1 (continued) 


def onBDown(e): 

'User pressed 1eft mouse button' 

global 1 ast Li ne , dx, dy ,rrix, my 

canvas.bind('<Motion>',onMove) # Start receiving move msgs 
dx,dy = e.X, e .y 
mx,my = e.x, e .y 

# Draw a temporary line 

lastLine = \ 

canvas.create_line(dx,dy,mx,my,width = 2,fi 11 ='Bl ack') 
def onBUp(e): 

'User released left mouse button' 

canvas.deiete(1 astLine) # Erase the temporary line 

canvas.unbind('<Motion>') # No more move msgs, please! 

# Send out the drawaline command 

sendMsg(chr(CMD_LINE)+cPickle.dumps((dx,dy,e.x,e.y),l)) 

def onEnter(foo): 

'User hit the [Enter] key' 

sendMsg(chr(CMD_MSG)+entry.get()) 

entry.deiete(0,END) # Ciear the entry widget 

def Setup(root): 

'Creates the user interface' 

global msgs,entry,canvas 

# The big window holding everybody's messages 

msgs = Text(root,width = 60,height=20) 
msgs.gr id(row=0,col=0,coiumnspan=3) 

# Hook up a scrollbar to see old messages 

s = Scrol1bar(root,orient=VERTICAL) 
s.config(command=msgs.yview) 
msgs.config(yscrollcommand=s.set) 
s.gri d(row=0,coi=3,st icky=N+S) 

# Where you type your message 

entry = Entry(root) 

entry.grid(row=l,col=0,columnspan=2,sticky=W+E) 
entry.bind('<Return>',onEnter) 
entry.focus_set() 

b = Button(root,text='Quit',command=onQui t) 
b.grid(row=l,coi=2) 

# A place to draw 

canvas = Canvas(root,bg='White' ) 
canvas.gr id(row=0,coi=5) 

# Notify me of button press and release messages 
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canvas.bind('<ButtonPress-l>',onBDown) 
canvas.bind('<ButtonRelease-l>',onBUp) 

def tnsgThread (addr, port, name): 

'Listens for and processes messages' 

# Create a listen Socket 

s = socket(AF_INET, SOCK_DGRAM) 
s.setsockopt(SOL_SOCKET,SO_REUSEADDR,1) 
s.bind(('',port)) 

# Join the multicast group 

s.setsockopt(SOL_IP,IP_ADD_MEMBERSHIP,\ 

inet_aton(addr)+inet_aton('')) 


w h i 1 e 1: 

# Get a msg and strip off the command byte 

msg,msgFrom = s.recvfrom(2048) 
cmd,msg = ord(msg[0]) ,msg[l: ] 

if cmd == CMD_J0INED: # New join 

msgs.insert(END,'(%s joined the chat)\n' % msg) 

# Introduce myself 

sendMsg(chr(CMD_JOINRESP)+ \ 

cPickle.dumps((name,myColor), 1)) 

elif cmd == CMD_LEFT: # Somebody left 
who = people[msgFrom][0] 

if who == name: # Hey, _I_ left, better quit 
break 

msgs . insert(END,'(%s left the chat)\n' % \ 
who,'coior_'+who) 

elif cmd == CMD_MSG: # New message to display 
who = people[msgFrom][0] 
msgs.insert(END,who,'coior_%s' % who) 
msgs . insert(END,': %s\n' % msg) 

elif cmd == CMD_LINE: # Draw a line 
dx,dy,ex,ey = cPickle.1oads(msg) 
canvas.create_line(dx,dy,ex,ey,width=2,\ 

fill=people[msgFrom][l]) 

elif cmd == CMD_J0INRESP: # Introducing themselves 
people[msgFrom] = cPickle.1oads(msg) 
who,color = people[msgFrom] 

# Create a tag to draw text in their color 

msgs.tag_configure('coior_’ + who,foreground=color) 


Continued 
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Listing 15-1 (continued) 


# Leave the multicast group 

s.setsockopt(SOL_IP,IP_DROP_MEMBERSHIP,\ 

inet_aton(addr)+inet_aton('')) 

if _name_ == '_main_ 

argv = sys.argv 
if len(argv) < 3: 

print ' Usage:',argv[0],'<name> <color> '\ 

'[addr=<multicast address>] [port=<port>]' 
sys . exit(1) 

global name, addr, port, myColor 

addr = '235.0.50.5' # Default IP address 

port = 54321 # Default port 

name,myColor = argv[l:3] 

for arg in argv[3:]: 

if arg. startswith('addr='): 

addr = arg[len('addr=’):] 
elif arg.startswith('port='): 

port = int(arg[len('port='):]) 

# Start up a thread to process messages 

threading.Thread(target=msgThread,\ 

args=(addr,port,name)).start() 

# This is the socket over which we send out messages 

global sendSock 

sendSock = socket(AF_INET,SOCK_DGRAM) 
sendSock.setsockopt(S0L_S0CKET,S0_REUSEADDR, 1) 
sendSock.connect((addr,port)) 

# Don't let the packets die too soon 

sendSock.setsockopt(SOL_IP,IP_MULTICAST_TTL,2) 

# Create a Tk window and create the GUI 

root = Tk() 

root.title('%s chatting on channel %s:%d' % \ 

(name,addr,port)) 

Setup(root) 

# Join the Chat! 

sendMsg(chr(CMD_J0INED)+name) 
root.mainloop() 


/Note Although this application will work on a local network, it may have trouble work- 
' ing between computers on the Internet. Some routers are configured to ignore 

multicast data packets, and the time-to-live (TTL) setting for the packets must be 
high enough to make the necessary number of hops between each computer. 
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As with most Python programs, this one packs a lot of punch in very few lines of 
code (it weighs in at about 120 lines, ignoring comments). The first thing to note is 
the msgThread function, whlch creates a Socket to listen for incomlng multicast 
messages. It uses the SO_REUSEADDR socket optlon to enable you to run multiple 
copies on one computer (otherwise, bi nd would complain that someone else was 
already bound to that address and port). It also uses I P_ADD_MEMBERSHI P to join a 
multicast group, and I P_DROP_MEMBERSHI P to leave it. The first byte of each mes- 
sage is a predefined command character, whlch msgThread uses to determine what 
to do with the message. 

When you type a message into the text entry box at the bottom of the dialog box, 
onEnter sends the text from the entry box to the multicast channel. Likewise, 
pressing the left mouse button, dragging a line, and releasing it causes onBUp to 
send the message to draw a new line. Note that neither of these actually displays a 
message or draws a line — they just send a message to the multicast group, and all 
running copies, including the one that originated the message, receive the message 
and process it. The socket that sends these messages doesnT need to join the mul¬ 
ticast group; anyone can send to a group, but only members can receive messages. 

When msgThread calls recvFrom to get a new message, it also gets the IP address 
and port of the sender. The program uses this tuple as a dictionary key to map to 
the name and color of the sender (each line is drawn in the sender’s color, as is that 
user’s name when they send a text message). 

One final thing to note is how the listening thread decides when to shut down. 

When you click the Quit button, the application notifies everyone that you are 
leavlng the chat group. Your listener also hears this message, and recognizing that 
the sender is itself, it stops waiting for more messages. 


Using SocketServers 

The SocketServer module defines a base class for a group of socket server 
classes — classes that wrap up and hide the details of listening for, accepting, and 
handling incoming socket connections. 

The SocketServer family 

TCPServer and UDPServer are SocketServer subclasses that handle TCP and UDP 
messages, respectively. 

'Note SocketServer also provides Uni xStreamServer (a child class of TCPServer) 

and Uni xDatagramServer (a child of UDPServer), which are the same as their 
parent classes except that the listening socket is created with a family of AF_UN IX 
instead of AF_INET. 
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By default, the Socket servers handle connections one at a time, but you can use the 
ThreadingMixIn and ForkingMixIn classes to create threading or forking versions 
of any SocketServer. In fact, the SocketServer module helpfully provides the fol- 
lowing classes to save you the trouble: Forki ngUDPServer, Forki ngTCPServer, 
ThreadingUDPServer,ThreadingTCPServer,ThreadingUnixStrearriServer, and 
ThreadingUnixDatagramServer. Obviously, tbe tbreading versions work only on 
platforms that support threads, and the forking versions work on platforms that 
support os. fork. 

Cross- ^ See Chapter 7 for an overview of mix-in classes, Chapter 11 for forking, and 

Referen^ Chapter 26 for threads. 

SocketServers handle incoming connections in a generic way; to make them useful, 
you provide your own request handler class to which it passes a Socket to handle. The 
BaseRequestFlandl er class in the SocketServer module is the parent class of ali 
request handlers. Suppose, for example, that you need to write a multithreaded e-mail 
server. First you create MailRequestFlandler,a subclass of BaseRequestFlandl er, 
and then you pass it to a newly created SocketServer: 


import SocketServer 

... # Create your Mai1RequestHandl er class here 


addr = ('175.15.30.2', 25) # Listen address and port 

server = SocketServer.ThreadingTCPServer(addr , 

Mai 1 RequestFlandl er) 


server.serve_forever() 


Each time a new connection comes in, the server creates a new Mai 1 RequestFlandl er 
instance object and calls its handl e () method so it can process the new request. 
Because the server is derived from Threadi ngTCPServer, with each new request it 
starts a separate thread to handle the request, so that multiple requests will be 
processed simultaneously. Instead of calllng server_forever , you can also call 
handl e_request (), whicb waits for, accepts, and processes asingle connection. 
server_forever merely calls handl e_request in an infinite loop. 

Don’t worry too mucb about the details of the request handler just yet; the next 
section covers everything you need to know. 

Normally, you can use one of the Socket servers as is, but if you need to create your 
own subclass, you can override any of tbe following methods to customize it. 

When the server is first created, the_ i ni t _function calls the server_bi nd () 

method to bind the listen socket (sel f. Socket) to the correct address 

(sel f. server_address). It then calls server_acti vate( ) to activate the server 

(by default, this calls the listen method of the socket). 

The socket server doesn’t do anything until the user calls either of the 
handl e_request or serve_forever methods. handl e_request calls 
get_request () to wait for and accept a new socket connection, and then calls 
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verify_request(request, cl ient_address )to see if the server should 
process the connection (you can use this for access control — by default, 
veri fy_request always returns true). If it’s okay to process the request, 
handl e_request then calls process_request(request, cli ent_address ), and 
then handle_error(request, client_address ) if process_request raised an 
exception. By default, process_request simply calls fi ni sh_request ( request, 
cl i ent_address ); the forking and threading mix-in classes override this behavior 
to start a new process or thread, and then call fini sh_request. fi ni sh_request 
instantiates a new request handler, which in turn calls its handl e () method. If you 
want to subclass a SocketServer, trace through this sequence of calls once or 
twice to make sure it makes sense to you, and review the source code of 
SocketServer for help. 

When a SocketServer creates a new request handler, it passes to the handler’s 

_ i ni t _function the sel f variable, so that the handler can access information 

about the server. 

The SocketServer’s fi 1 eno () method returns the file descriptor of the listen 
Socket. The address_fatni ly member variable specifies the socket family of the 
listen Socket (for example, AF_INET), and server_address holds the address to 
which the listen socket is bound. The socket variable holds the listen socket itself. 

Request handiers 

Request handiers have setup(), handle(),andfinish() methods (none of which 
do anything by default) that you can override to add your custom behavior. Normally, 
you need to override only the handl e method. The BaseRequestHandl er’s 

_ini t function calls s e t u p () for initialization work, h a n d 1 e () to Service the 

request, and f i n i s h () to perform any cleanup, although f i n i s h isn’t called if 
handleor Setup raise an exception. Keep in mind that a new instance of your 
request handler is created for each request. 

The request member variable has the newly accepted socket for stream (TCP) 
servers; for datagram (UDP) servers, it is a tuple containing the incoming message 
and the listen socket. cl i ent_address holds the address of the sender, and 
serverhas a referencetotheSocketServer (through which you can access its 
members, such as server_address). 

The following example implements EchoRequestHandl er, a handler that repeats 
back to the remote side any data it sends: 

>>> itnport SocketServer 

>>> class EchoRequestHandler(SocketServer.BaseRequestHandler): 
def handlefself): 

print 'Got new connectioni' 
w h i 1 e 1: 

tnsg = sel f. request. recv ( 1024) 
i f not tnsg : 
break 
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print ' Received :’,nisg 
sel f. request. sencKtnsg) 
print 'Done with connection' 

>>> server = SocketServer.ThreadingTCPServer(\ 

('127.0.0.1',12321),EchoRequestHandler) 

>>> server.handle_request( ) It'll wait here for a connection 
Got new connection! 

Received : Helio! 

Received : I 1 ike Tuesdays! 

Done with connection 

In another Python interpreter, you can connect to the server and try it out: 

>>> frotn Socket import * 

>>> s = socket(AF_INET,SOCK_STREAM) 

>>> s.connect(('127.0.0.1',12321)) 

>>> s.send('Hei 1o! ' ) 

6 

>>> print s.recv(1024) 

Hei 1 0 ! 

>>> s.send('I 1 ike Tuesdays!') 

16 

>>> print s.recv(1024) 

I 1 ike Tuesdays! 

>>> s.close() 

The SocketServer module also delines two subclasses of BaseRequestHandl er: 
StreamRequestHandl er and DatagramRequestHandl er. These override the setup 
and f i ni sh methods and create two file objects, rf i 1 e and wf i 1 e, that you can use 
for reading and writing data to the Client, instead of using the usual Socket methods. 


Processing Web Browser Requests 

Now that you have a SocketServer, what do you do with it? Why, extend it, of 
course! The Standard Python library comes with BaseHTTPServer, 

Sitnpl eHTTPServer, and CGIHTTPServer modules that implement increasingly 
complex Web server request handlers. 

Most likely, you would use them as starting points on which to build, but to some 
extent they do work on their own as well. For example, how many lines does it take 
to implement a multithreaded Web server that supports running CGI Scripts? Well, 
at a bare minimum, it takes the following: 

import SocketServer,CGIHTTPServer 
SocketServer.ThreadingTCPServer(('127.0.0.1',80),\ 

CGIHTTPServer.CGIHTTPRequestHandler).serve_forever() 

Point your Web browser to http://127.0.0.1/file (where file is the name of 
some text file in your current directory) and verify that it really does work. 
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BaseHTTPRequestHandIer 

The starting class for a Web server request handler is BaseHTTPRequestHandIer 
(in the BaseHTTPServer module), a child of StreamRequestHandl er. This class 
accepts an HTTP connection (usually from a Web browser), reads and extracts the 
headers, and calls the appropriate method to handle the request. 

Subclasses of BaseHTTPRequestHandl er should not override the_i ni t_or 

handle methods, but should instead implement a method for each HTTP command 
they need to handle. For each HTTP command (GET, POST, and so on), 
BaseHTTPRequestHandIer calls its do_<cotTitTiand> method, if present. For 
example, if your subclass needs to support the HTTP PUT command, just add a 
do_PUT() method to your subclass and it will automatically be called for any 
HTTP PUT requests. 

Tbe request handler Stores the original request line in its raw_request instance 
variable, and its parts in command (GET, POST, and so on), path (for example, / 
index.html), and request_versi on (for example, HTTP/1.0), headers is an instance 
of mi metool s . Message, and contains the parsed version of the request headers. 

Cross- ^ See Chapter 17 for more information about the mimetool s . Message class. 
Refere nce Y Alternatively, you can specify a different class to use for reading and parsing the 
headers by changing the value of the BaseHTTPRequestHandIer. 
MessageCl ass class variable. 

Use the rf i 1 e and wf i 1 e objects to read and write data. If the request has addi- 
tional data beyond the request headers, rf i 1 e will be positioned at the beginning 
of that data by the time the handler calls the appropriate do_<command> method. 

BaseHTTPRequestHandIer uses the value in server_versi on when writing out a 
Server response header; you can customize this from its default of BaseHTTP/O.x. 
Additionally, the protocol_versi on variable defaults to HTTP/1.0, but you can set 
it to a different version if needed. 

In your do_<command) method, the first output you send should be via the 
send_response (code[, message]) method, where code is an HTTP code (such as 
200) and message is an optional text message explaining the code. (If the request is 
invalid, you can instead call send_error (code[, message]), and then return from 
the command method.) When you call send_response, BaseHTTPRequestHandIer 
adds in Date and Server headers. 

After a call tosend_response, you can call send_header ( key, value) as needed 
to write out MIME headers; call end_headers () when you’re done: 

def do_GET(self): 

self.send_response(200) 

self.send_header('Content-type','text/html ') 
self.send_header('Content-length','len(data)') 
self.end_headers() 

# send the rest of the data 
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Most Web servers generate logs for later analysis. Call the 1 og_request( [code[, 
s i ze ] ]) method to log a successful request (including the si ze, if known, makes 
the logs more useful). 1 og_tnessage (format, argO, argl, . . . ) is a general-pur- 
pose logging method; the format and arguments are similar to normal Python string 
formatting: 

sel f. 1 og_message( ' %s : %d ' , 'Time tal<en',425) 

Each request is automatically logged to stdout using the NCSA httpd logging 
format. 

SimpleHTTPRequestHandIer 

Whereas the BaseHTTPRequestHandl er doesn’t actually handle any HTTP com- 
mands, Simpl eHTTPRequestHandl er (in the Simpl eHTTPServer module) adds 
support for both HEAD and GET commands by sending back to the Client requested 
files that reside in the current working directory or any of its subdirectories. If the 
requested file is actually a directory, Si mpl eHTTPRequestHandl er generates, on 
the fly, a Web page containing a directory listing; and sends it back to the Client. 

Try the following example to see this in action. This code starts a Web server on 
port 8000, and then opens a Web browser and begins browsing in the current 
working directory. Because the server continuously loops to serve requests, the 
example starts the server on a separate thread so you can stili launch a Web 
browser: 

>>> import Webbrowser,threading,SimpleHTTPServer 
>>> def go(): 

t = SimpleHTTPServer . test 
threading.Thread(target=t).start() 

Webbrowser.open('http://127.0.0.1:8000' ) 

>>> go() # Below is the output after browsing around a litti e 
Serving HTTP on port 8000 ... 

endor - - [28/Dec/2000 18:00:48] "GET /3dsmax3/ HTTP/1.1" 200 - 
endor - - [28/Dec/2000 18:00:50] "GET /3dsmax3/Maxsdk/ 

HTTP/1.1" 200 - 

endor - - [28/Dec/2000 18:00:53] "GET /3dsmax3/Maxsdk/Include/ 
HTTP/1.1" 200 - 

The teste ) function in the Si mpl eHTTPServer module simply starts a new server 
on port 8000. 

In addition to the variables inherited from BaseHTTPRequestHandl er, this class 
has an extensi ons_map dictionary that maps file extensions to MIME data types, 
so that the user’s Web browser will correctly handle the file it receives. You can 
expand this list to add new types you want to support. 
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CCmnPRequestHandler 

The CGI HTTPRequestHandl er (in the CGI HTTPServer module) takes 
Simpl eHTTPRequestHandl er one step further and adds support for executing 
CGI Scripts. The CGI (Common Gateway Interface) is a Standard for executing 
server-side programs that can process input from the user’s browser (saving data 
they entered in an HTML form, for example). 

Caution Before you ever make a Web server open to public use, take the time to leam 
about what security risks are involved. This warning is doubly strong for modules 
such as CGIHTTPRequestHandl er that can execute arbitrary Python code; even 
the smallest security hole is an invitation for intruders. 

For each GET or POST command that comes in, CGI HTTPRequestHandl er checks 
whether the specified file is actually a CGI program and, if so, launches it as an exter- 
nal program. If it is not, the file contents are sent back to the browser normally. Note 
that the POST method is supported for CGI programs only. 

To decide if a file is a valid CGI program, CGIHTTPRequestHandl er checks the file’s 
path against the egi_di rectori es member list, which, by default, contains the 
directories /egi-bin and htbin (you can add other directories if you want). If the file is 
in one of those directories or any of their subdirectories and is either a Python mod¬ 
ule or an executable file, the file is exeeuted and its output returned to the Client. 

Example: form handier CGI script 

The example in this section shows CGIHTTPRequestHandler at work. Follow these 
steps to try it out: 

1 . Listing 15-2 is a tiny HTML form that asks you to enter your name. Save the file 
to disk (anywhere you want) as form. html . I saved it to c: \temp, so in the 
following steps, replace c:\temp with the directory you chose. 

2 . In the same directory, create a subdirectory called c g i - b i n: 
md c:\temp\cgi-bin (from an MS-DOS prompt) 

3 . Listing 15-3 is a small CGI script; save it to your new egi bi n directory as 

handleForm.py. 

4. Switch to your original directory (c:\temp), start up a Python interpreter, and 
enter the following lines to start a Web server: 

>>> import CGIHTTPServer 
>>> CGIHTTPServer.test() 

5 . Open a Web browser and point itto http://127.0.0.1:8000/form.html to 
dlsplay the simple Web page shown in Figure 15-2. 
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Figure 15-2: The Python Web server returned this page; clicking Go 
executes the CGI script. 

6. Enter your name in the text box and click Go. The Web server executes the 
Python CGI script and displays the results shown in Figure 15-3. 



Figure 15-3: The Python Web server ran the CGI script and returned 
the results. 
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Listing 15-2: form.html - A simple HTML form 


<htnil ><body> 

<fortn rriethod=GET 

action="http://127.0.0.1:8000/cgi-bin/handleFortn.py"> 
Your natne : <i nput natne = "User"> 

<input type="Subtnit" value="Go!"> 

</f ortn> 

</body></htttil > 


Listing 15-3: handIeForm.py - A Python CGI script 


import os 

print "Content-type : text/html\r\n<htrril ><body>" 
name = os.environ.get('QUERY_STRING','') 
print 'Helio, %s!<p>' % narrie[l en ( ' User=' ): ] 
print ’ </body></httnl > ’ 


To make use of this functionality, you should read up on CGI (which is certainly not 
specific to Python). Although a complete discussion is outside the scope of this 
chapter, the following few hints will help get you started: 

-f CGI HTTPRequestHandl er Stores the user information (including form values) 
in environment variahles. (Write a simple CGI script to print out all variables 
and their values to test this.) 

♦ Anything you write to st do ut (via pri nt or sys . stdout. wri te) is returned 
to the Client, and it can be text or binary data. 

•f CGI HTTPRequestHandl er outputs some response headers for you, but you 
can add others if needed (such as the Content-type header in the example). 

♦ After the headers, you must output a blank line before any data. 

♦ On UNIX, external programs run with the nobody user ID. 


Handiing Multiple Requests Without Threads 

Although threads can help the Web servers in the previous sections handle more 
than one connection simultaneously, the program usually sits around waiting for 
data to be transmitted across the network. (Instead of being CPU bound, the pro¬ 
gram is said to be I/O bound.) In situations where your program is I/O bound, a lot 
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of CPU time is wasted switching between threads that are just waiting until they can 
read or write more data to a file or socket. In such cases, it may be better to use the 
sel ect and asyncore modules. These modules stili let you process multiple 
requests at a time, but avoid all the senseless thread switching. 

The selecte inii st, outList, errListC, timeout]) functionin the select 

module takes three lists of objects that are waiting to perform input or output (or 
want to be notified of errors). sel ect returns three lists, subsets of the originals, 
containing only those objects that can now perform I/O without blocking. If the 
timeout parameter is given (a floating-point number indicating the number of 
seconds to wait) and is non-zero, sel ect returns when an object can perform I/O 
or when the time limit is reached (whereupon empty lists are returned). A timeout 
value of 0 does a quick check without blocking. 

The three lists hold input, output, and error objects, respectively (objects that are 
interested in reading data, writing data, or in being notified of errors that occurred). 
Any of the three lists can be empty, and the objects can be integer file descriptors 
or filelike objects with a f i 1 en o () method that returns a valid file descriptor. 

See "Working with File Descriptors" in Chapter 10 for more information. 


By using sel ect, you can start several read or write operations and, instead of 
blocking until you can read or write more, you can continue to do other work. This 
way, your I/O-bound program spends as much time as possible being driven by its 
performance-limlting factor (I/O), Instead of a more artlficial factor (switching 
between threads). With sel ect, it is possible to write reasonably hlgh-performance 
servers in Python. 

/Note On Windows systems, sel ect () works on socket objects only. On UNIX systems, 

' however, it also works on other file descriptors, such as named pipes. 

A slightly more efficient alternative to sel ect is the select.poll () function, 
which returns a polling object (available on UNIX platforms). After you create a 
polling object, you call the register(fd[, eventmask]) method to register a par- 
ticular file descriptor (or object with a fi 1 eno () method). The optional eventmask 
is constructed by bitwise OR-ing together any of the following; select.POLLIN (for 
input), sel ect. POLLPRI (urgent input), sel ect. POLLOUT (for output), or 
select.POLLERR. 

You can register as many file descriptors as needed, and you can remove them from 
the object by calling the polling objecfs unregister(fd) method. 

Call the polling objecfs poli ([timeout]) method to see which file descriptors, if 
any, are ready to perform I/O without blocking. poli returns a possibly empty list 
of tuples of the form ( fd , event ), an entry for each file descriptor whose state has 
changed. The event will be a bitwise-OR of any of the eventmask flags as well as 
POLLHUP (hang up) or POLLNVAL (an invalid file descriptor). 
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asyncore 

If youVe never used selectorpoll before, it may seem complicated or confusing. 
To help in creating sel ect-based socket clients and servers, the asyncore module 
takes care of a lot of the dirty work for you. 

asyncore defines the dispatcher class, a wrapper around a normal socket object 
that you subclass to handle messages about when the socket can be read or 
written without blocking. Because it is a wrapper around a socket, you can often 
treat a dispatcher object like a normal socket (it has the usual connect(addr), 
send(data),recv(bufsize),listen([bacl<log]),bind(addr),accepte), and 
cl ose () methods). 

Although the dispatcher is a wrapper around a socket, you stili need to create the 
underlying socket (either the caller needs to or you can create it in the dispatcher’s 
constructor) by calling the create_socket( fami ly , type) method: 

d = myDispatcher() 

d.create_socl<et(AF_INET,SOCK_STREAM) 

create_socket creates the socket and sets it to nonblocking mode. 

asyncore calls methods of a dispatcher object when different events occur. When 
the socket can be written to without blocking, for example, the handl e_wri te () 
method is called. When data is available for reading, handl e_read() is called. You 
can also implement handl e_connect() for when a socket connects successfully, 
handl e_cl ose() for when it closes, and handl e_accept() for when a call to 
socket. accept will not block (because an incoming connection is available and 
waiting). 

asyncore calls the readabl e() and wri tabi e () methods of the dispatcher object 
to see if it is interested in reading or writing data, respectively (by default, both 
methods always return 1). You can override these so that, for example, asyncore 
doesn’t waste time checking for data if youVe not even trying to read any. 

In order for asyncore to fire events off to any dispatcher objects, you need to call 
asyncore .pol 1 ([ti meout]) (on UNIX, you can also call asyncore. pol 1 2 
([ti meout]) to use poli instead ofselect)or asyncore.loop([t i meout]). These 
functions use the sel ect module to check for a change in I/O state and then fire off 
the appropriate events to the corresponding dispatcher objects. poli checks once 
(with a default timeout of 0 seconds), but 1 oop checks until there are no more 
dispatcher objects that return true for either readableorwritable,or until the 
timeout is reached (a default of 30 seconds). 

The best way to absorb ali this is by looking at an example. Listing 15-4 is a very 
simple asynchronous Web page retrieval class that retrieves the i ndex. html page 
from a Web site and writes it to disk (including the Web server’s response headers). 
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Listing 15-4: asyncget.py - Asynchronous 
HTML page retriever 


import asyncore, Socket 

class AsyncGeKasyncore.dispatcher): 

def _init_(self, host): 

asyncore.dispatcher._i nit_(self) 

self.host = host 

self.create_socket(Socket.AF_INET, socket.SOCK_STREAM) 
self.connect((host,80)) 

self.request = 'GET /index.html HTTP/1.0\r\n\r\n' 
self.outf = None 

print 'Requesting index.html from',host 

def handle_connect(sel f): 

print ' Connectself.host 

def handle_read(sel f): 
if not self.outf: 

print ' Creatingself.host 
self.outf = open(self.hostwt') 

data = self.recv(8192) 
if data: 

self. outf.wri te(data) 

def writeable(self): 

return 1 en(self.request) > 0 

def handle_write(sel f): 

# Not all data might be sent, so track what did make it 

num_sent = self.send(self.request) 
self.request = self.request[num_sent:] 

def handle_close(sel f): 

asyncore.dispatcher.close(self) 
print 'Socket closed forself.host 
if self.outf: 

self.outf.close() 

# Now retrieve some pages 

AsyncGet( 'www.yahoo.com' ) 

AsyncGet('www.cnn.com') 

AsyncGet( 'www.python.org' ) 

asyncore.1oop() # Walt until all are done 
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Here’s some sample output: 

C:\terrip>asy neget, py 

Requesting index. html from www.yahoo.cotn 

Requesting index.html from www.cnn.com 

Requesting index.html from www.python.org 

Connect www.yahoo.com 

Connect www.cnn.com 

Creating www.yahoo.com 

Connect www.python.org 

Creating www.cnn.com 

Creating www.python.org 

Socket closed for www.yahoo.com 

Socket closed for www.python.org 

Socket closed for www.cnn.com 

Notice that the requests did not ali finish in the same order they were started. 
Rather, they each made progress according to when data was available. By being 
event-driven, the 1/0-bound program spends most of its time working on its great- 
est performance boundary (I/O), instead of wasting time with needless thread 
switching. 


Summary 

If youVe done any networking programming in some other languages, you’ll find 
that doing the same thing in Python can be done with a lot less effort and bugs. 
Python has full support for Standard networking functionality, as well as utility 
classes that do much of the work for you. In this chapter, you: 

Converted IP addresses to registered names and back. 

Created sockets and sent messages between them. 

Used SocketServers to quickly build custom servers. 

Built a working Web server in only a few lines of Python code. 

Used sel ect to process multiple socket requests without threads. 

The next chapter looks at more of Python’s higher-level support for Internet proto- 
cols, including modules that hide the nasty details of “speaking” protocols such as 
HTTP, FTP, and telnet. 



Speaking 

Internet 

Protocois 


O n the Internet, people use various protocois to transfer 
files, send e-mail, and request resources from the World 
Wide Web. Python provides libraries to help work with 
Internet protocois. This chapter shows how you can wrlte 
Internet programs without having to handle lower-level 
TCP/IP details such as sockets. Supported protocois Include 
HTTP, POP3, SMTP, FTP, and Telnet. Python also provides use- 
ful CGI scripting abilities. 


Python's Internet ProtocoI Support 

Python’s Standard libraries make it easy to use Standard 
Internet protocois such as HTTP, FTP, and Telnet. These 
libraries are built on top of the Socket library, and enable 
you to program networked programs with a minimum of 
low-level code. 

Each Internet protocol is documented in a numbered request for 
comment (RFC). The name is a bit misleading for established 
protocois such as POP and FTP, as these protocois are widely 
implemented, and are no longer under much discussioni 

These protocois are quite feature-rich — the RFCs for the 
protocois discussed here would fili several hundred printed 
pages. The Standard Python modules provide a hlgh-level 
Client for each protocol. However, you may need to know 
more about the protocois’ syntax and meaning, and the RFCs 
are the best place to learn this Information. One good Online 
RFC repository is at http : //www. rf c-edi tor . org/. 



> ♦ ♦ ♦ 
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Cross- \ 
Reference, 


Refer to Chapter 15 for more information about the 
Socket module and a quick overview of TCP/IP. 
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Retrieving Internet Resources 

The library uri 1 i b provides an easy mechanism for grabbing files from the 
Internet. It supports HTTP, FTP, and Gopher requests. Resource requests can take a 
long time to complete, so you may want to keep them out of the main thread in an 
interactive program. 

The simplest way to retrieve a URL is with one line: 

urlretrieve(url[,filenatne[,callback[,data]]]) 

The function uri retrieve retrieves the resource located at the address uri and 
writes it to a file with name filename. For example: 

>>> MyURL="http://www .pythonapocrypha.com" 

>>> uri 1 ib.uriretrieve(MyURL, "pample2.swf") 

>>> uri 1 ib.uricleanup() # clean the cache! 

If you do not pass a filename to urlretrieve,a temporary filename will be magi- 
cally generated for you. The function urici eanup frees up resources used in calls 
to uriretrieve. 

The optional parameter callback is a function to call after retrieving each block of a 
file. For example, you could use a callback function to update a progress bar show- 
ing download progress. The callback receives three arguments: the number of 
blocks already transferred, the size of each block (in bytes), and the total size of 
the file (in bytes). Some FTP servers do not return a file size; in this case, the third 
parameter is -1. 

Normally, HTTP requests are sent as GET requests. To send a POST request, pass 
a value for the optional parameter data. This string should be encoded using 

uriencode. 

To use a proxy on Windows or UNIX, set the environment variables http_proxy, 
ftp_proxy, and/or gopher_proxy to the URL of the proxy server. On a Macintosh, 
proxy Information from Internet Config is used. 

Manipulating URLs 

Spedal characters are encoded in URLs to ensure they can be passed around easily. 
Encoded characters take the form %##, where ## is the ASCII value of the character 
in hexadecimal. Use the function quote to encode a string, and unquoteto trans¬ 
late it back to normal, human-readable form: 

>>> print uri 1 ib.quote("human:nature") 
human%3anature 

>>> print uri 1 ib.unquoteC"cel1o%23music") 
cel1o#music 
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The function quote_pl us does the encoding of quote, but also replaces spaces 
with plus signs, as required for form values. The corresponding function 
unquote_pl us decodes such a string: 

>>> print uri 1 ib.quote_plus("bob+ali ce forever") 
bob%2balice+forever 

>>> print uri 1 ib.unquote_pl us("where+are+my+keys?") 
where are my keys? 

Data for an HTTP POST request must be encoded in this way. The function 

uri encode takes a dictionary of names and values, and returns a properly encoded 

string, suitable for HTTP requests: 

>>> print uri 1 ib.uriencodef 

{"name"Eric","species"sea bass")) 
speci es=sea+bass&natTie=Eri c 

•'Cross- ^ See the module uri parse, covered in Chapter 17, for more functions to parse 
Referen^ and process URLs. 


Treating a URL as a file 

The function urlopenCurl [,data]) creates and returns a filelike object for the 
corresponding address uri. The source can be read like an ordinary file. For exam- 
ple, the following code reads a Web page and checks the length of the file (the full 
HTML text of the page): 

>>> Page=urllib.urlopen("http://www.python.org") 

>>> print 1 en(Page. read()) 

339 

The data parameter, as for uri retri eve, is used to pass urlencoded data for a 
POST request. 

The filelike object returned by uri open provides two bonus methods. The method 
geturl returns the real URL — usually the same as the URL you passed in, but 
possibly different if a Web page redirected you to another URL. The method i n f o 
returns a mi metool s . Message object describing the file. 

Refer to Chapter 17 for more Information about mi metool s. 



URLopeners 

The classes URLopener and FancyURLopener are what you actually build and use 
with calls to u r 1 o p e n and urlretrieve. You may want to subclass them to handle 
new addressing schemes. You will probably always use FancyURLopener. It is a 
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subclass ofURLopener that handles HTTP redirections (response code 301 and 
302) and basic authentication (response code 401). 

The opener constructor takes, as its first argument, a mapping of schemes (such as 
HTTP) to proxies. It also takes the keyword arguments key_f i 1 e and cert_f i 1 e, 
which, if supplied, allow you to request secure Web pages (using the HTTPS scheme). 

Note The default Python build does not currently include SSL support. You must edit 
Modules/Setup to include SSL, and then rebuild Python, in order to open https:// 
addresses with uri 1 i b. 

Openers provide a method, openCurl [,data]), that opens the resource with 
address uri. The data parameter works as in uri 1 i b . uri open. To open new uri 
types, overrldethe method open_unknown(url [ ,data] ) inyour subclass. By 
default, thls method returns an “unknown uri type” lOError. 

Openers also provide a method retrieve(url[,filenarrie[,hook[,data]]]), 
which functions like uri 1 i b. uri retri eve. 

The HTTP header user agent identifies a piece of Client Software to a Web server. 
Normally, urllib telis the server that it is Python-urllib/1.13 (where 1.13 is the 
current version of urllib). If you subclass the openers, you can override this by 
setting the version attribute before calling the parent class’s constructor. 


Extended URL opening 


The module uri 1 i b2 is a new and improved version of uri 1 i b. uri 1 i b2 provides a 
wider array of features, and is easier to extend. The syntax for opening a URL is the 
same: urlopenfurl [,data]). Here, uri can be a string or a Request object. 

The Request class gathers HTTP request information (it is very similar to the class 
httpl i b . HTTP). Its constructor has syntax RequestCurl [,data[,headers]]). 
Here, headers must be a dictionary. After constructing a Request, you can call 
add_header( name , val ue ) to send additional headers, and add_data (data ) to 
send data for a POST request. For example: 

>>> # Request constructor is picky: "http://" and the 
>>> # trailing slash are both required here: 

>>> MyRequest=url1 ib2.Request("http://www.python.org/") 

>>> MyRequest.add_header("user-agent","Testing 1 2 3") 

>>> URL=url1 ib2.uriopen(MyRequest) 

>>> print URL.readline() # read just a little bit 
<HTML> 

The module uri 1 i b2 can handle some fancier HTTP requests, such as basic 
authentication. For further details, consuit the module documentation. 

The module uri 1 i b2 is new in Python Version 2.1. 
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Sending HTTP Requests 

HyperText Transfer Protocol (HTTP) is a format for requests that a Client (usually a 
browser) sends to a server on the World Wlde Web. An HTTP request includes vari- 
ous headers. Headers include Information such as the URL of a requested resource, 
file formats accepted by the Client, and cookies, parameters used to cache user- 
specific Information (see RFC 2616 for details). 

The httpl i b module lets you build and send HTTP requests and receive server 
responses. Normally, you retrieve Web pages using the uri 1 i b module, which is 
simpler. However, httpl i b enables you to control headers, and it can handle POST 
requests. 

Building and using request objects 

The module method HTTP( [host[, port]] ) constructs and returns an HTTP 
request object. The parameter hostis the name of a host (such as www .yahoo . cotn). 
The port number can be passed via the port parameter, or parsed from the host 
name; otherwise, it defaults to 80. If you construet an HTTP object without provid- 
ing a host, you must call its connect(host[,port]) method to connect to a server 
before sending a request. 

To start a Web request, call the method putrequest(action,URL). Here, acti on 
is the request method, such as GET or POST, and URL is the requested resource, 
such as /stuff/junk/index.html. 

After starting the request, you can (and usually will) send one or more headers, by 
calling putheaderfname, value[, anothervalue,...]). Then, whether you sent 
headers or not, you call the endheaders method. For example, the following code 
informs the server that HTML files are accepted (something most Web servers will 
assume anyway), and then finishes off the headers: 

MyHTTP.putheaderC'Accept', 'text/html') 

MyHTTP.endheaders() 

You can pass multiple values for a header in one call toputheader. 

After setting up any headers, you may (usually on a POST request) send additional 
data to the server by calling send(data). 

Now that you have built the request, you can get the server’s reply. The method 
getrepl y returns the server’s response in a 3-tuple: (replycode, message, 
headers ). Here, replycode is the HTTP status code (200 for success, or perhaps the 
infamous 404 for “resource not found”). 

The body of the server’s reply is returned (as a file object with read and cl ose 
methods) by the method getf i 1 e. This is where the request object finally receives 
what it asks for. 
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For example, the following code retrieves the front page from www.yahoo. cotn; 

>>> Request=httpl ib.HTTP("www.yahoo.com") 

>>> Request.putrequest("GET","/") 

>>> Request.endheaders() 

>>> Request.getreply() 

(200, 'OK', <nii metool s . Message instance at 0085EBD4>) 

>>> ThePage=Request.getfi 1 e() 

>>> print ThePage.readline()[:50] 

<httTil ><head><title>Yahoo!</title><base href=http: / 

This example performs a Web search by sending a POST request. Data in a POST 
request must be properly encoded using urllib.urlencode (see Listing 16-1). 
This code uses an HTMLParser (from html 1 i b) to extract ali links from the search 
results. 

See Chapter 18 for complete information about html 1 i b. 



Listing 16-1: WebSearch.py 


import httplib 
import htmllib 
import urllib 
import formatter 

# Encode our search terms as a URL, by 

# passing a dictionary to uriencode 

SearchDict={"q":"Charles Dikkins", 

"kl":"XX"pg":"q","Transi ate": "on") 
SearchString=urllib.urlencode(SearchDict) 
print "searchSearchString 
Request=httplib.HTTP("www .altavista.com") 

Request.putrequest("POST"/egi-bin/query") 

Request.putheader('Accept', 'text/plain') 

Request.putheader('Accept', 'text/html') 

Request .putheader('Host', 'www .alta-vista.com' ) 

Request.putheader("Content-length",'len(SearchString)') 
Request.endheaders() 

Request.send(SearchStri ng) 
print Request.getreply() 

# Read and parse the resuiting HTML 
HTML=Request.getfi 1 e().read() 

MyParser=html1 ib.HTMLParser(formatter.Nui 1Formatter()) 
MyParser.feed(HTML) 

# Print all the anchors from the results page 

print MyParser.anchorli st 
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Sending and Receiving E-Mail 

Python provides libraries that receive mail from, and send mail to, a mail server. 
Electronic mail is transmltted via varlous protocois. The most common mail proto¬ 
cois are P0P3 (for receiving mail), SMTP (for sending mail), and 1MAP4 (for reading 
mail and managing mail folders). They are supported by the Python modules 
popl i b, smtpl i b, and imapl i b, respectively. 

Accessing P0P3 accounts 

To access a P0P3 mail account, you construet a P0P3 object. The P0P3 object 
offers various methods to send and retrieve mail. It raises the exception 
popl i b . error_proto if it encounters problems. See RFC 1939 for the full P0P3 
protocol. 

Many of its methods return output as a 3-tuple: a server response string, response 
lines (as a list), and total response length (in bytes). In general, you can access the 
second tuple element and ignore the others. 


Connecting and logging in 

The P0P3 constructor takes two arguments: host and port number. The port param- 
eter is optional, and defaults to 110. For example: 

Mai1box=poplib.P0P3("mai1.gianth.com") # connect to mail server 

After connecting, you can access the mail server’s greeting by calling getwel come. 
You normally sign in by calling user (name) and then pass(password).To slgn on 
using APOP authenticatlon, call apop(username, secret).To sign in using RPOP, 
callrpop(username).(Currently, rpopisnot supported.) 

Once you log in, the mailbox is locked until you call quit (or the session times 
out). To keep a session from timing out, you can call the method noop, which 
simply keeps the session alive. 


Checking mail 

The method stat checks the mailbox’s status. It returns a tuple of two numbers: 
the number of messages and the total size of your messages (in bytes). 

The method 1 i st ([ i ndex] ) lists the messages in your inbox. It returns a 3-tuple, 
where the second element is a list of message entries. A message entry is the mes- 
sage number, followed by its size in bytes. Passing a message index to 1 i st makes 
it return just that message’s entry: 

>>> Mai1box.1 ist() 

C+OK 2 messages (10012 octets)', ['1 9003', '2 1009'], 16) 

>>> Mai1box.1 i st(2) 

+0K 2 1009 
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The method ui dl ([index]) retrieves unique identifiers for the messages in a mail- 
box. Unique identifiers are unchanged by the addition and deletion of messages, 
and they are unique across sessions. The method returns a list of message indexes 
and corresponding unique IDs: 

>>> Mai1box.uidl() 

(’+0K 2 messages (10012 octets)', ['1 2', '2 3’], 10) 

>>> Mai1box.uidl(2) 

+0K 2 3 

Retrieving maii 

The method retr( index) retrieves and returns message number index from your 
mailbox. What you get back is actually a tuple: the server response, a list of mes¬ 
sage lines (including headers), and the total response length (in bytes). To retrieve 
part of a message, call the method top (i ndex , 1 i nes) —top is the same as retr, 
but stops after lines lines. 


Deleting maiI 

Use the method dei e ( i ndex ) to delete message number index. If you change your 
mind, use the method rset to cancel ali deletions you have done in the current 
session. 


Signing off 

When you finish accessing a mailbox, call the quit method to sign off. 


Example: retrieving maii 

The code in Listing 16-2 signs on to a maii server and retrieves the full text of the 
first message in the mailbox. It does no fancy error handling. It strips off ali the 
message headers, printing only the body of the message. 


Listing 16-2: popmail.py 


import poplib 

# Replace server, user, and password with your 

# maii server, user name, and password! 

Mai1box=poplib.P0P3("mai1.seanbaby.com") 

Mai1box. user("dumplechan@seanbaby.com") 

Mai1box.pass_("secretpassword") 

MyMessage=Mai1box.retr( 1) 

FullText="" # Build up the message body in FullText 

PastHeaders=0 
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for MessageLine in MyMessage[1]: 
if PastHeaders==0: 

# A blank line marks the end of headers: 

if (1 en(MessageLine)==0): 

PastHeaders=l 

el se: 

FuilText+=MessageLine+"\n" 

Mai1box.quit() 
print FullText 


Accessing SMTP accounts 

The module stntpl i b delines an object, SMTP, that you use to send mail using the 
Simple Mail Transport Protocol (SMTP). An enhanced version of SMTP, called 
ESMTP, is also supported. See RFC 821 for the SMTP protocol, and RFC 1869 for 
information about extensions. 


Connecting and disconnecting 

You can pass a host name and a port number to the SMTP constructor. This con- 
nects you to the server immediately. The port number defaults to 25: 

Outbox=stTitpl i b. SMTP ("mail .gianth.com") 

If you do not supply a host name when you construet an SMTP object, you must call 
its connect method, passing it a host name and (optionally) a port number. The 
host name can specify a port number after a colon: 

Outbox=smtplib.SMTP() 

0utbox.connect("mail.gianth.com:25") 

After you finish sending mail, you should call the quit method to close the 
connection. 


Sending mail 

The method sendmailCsender, recipi ents, message[,options, 
rcpt_opti ons] ) sends e-mail. The parameter sender is the message author (usu- 
ally your e-mail address!). The parameter recipients is a list of addresses that should 
receive the message. The parameter message is the message as one long string, 
including all its headers. For example: 

>>> MyAddress=bob@myserver. com 

>>> TargetAddress="earl@o therserver.com" 

>>> FleaderText="From: "+MyAddress+"\r\n" 

>>> FleaderText+="To: "+TargetAddress-t-"\r\n\r\n" 

>>> 0 ut box. sendmai 1 (MyAddress , [Target Address], FleaderText-(-"Fli !") 
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To use extended options, pass a list of ESMTP options in the options parameter. You 
can pass RCPT options in the rcpt_options parameter. 

The method sendmail raises an exception if it could not send mail to any recipient. 
If at least one address succeeded, it returns a dictionary explaining any failures. In 
this dictionary, each key is an address. The corresponding value is a tuple: resuit 
code and error message. 

Other methods 

The method ver i fy (address) checks an e-mail address address for validity. It 
returns a tuple: the first entry is the response code, the second is the server’s 
response string. A response code of 250 is success; anything above 400 is failure: 

>>> Outbox.veri fy("dumplechan@seanbaby.com") 

(250, 'ok its for <dutnpl echan@seanbaby. cotn> ' ) 

>>> Outbox. veri fy("ditnplechin@seanbaby. com") 

(550, 'unknown user <dimplechin@seanbaby.com>') 

An ESMTP server may support various extensions to SMTP, such as delivery Ser¬ 
vice notification. The method has_extn (name ) returns true if the server supports 
a particular extension: 

>>> Outbox.has_extn("DSN") # is status-notification available? 

1 

To identify yourself to a server, you can call hei o( [host]) for an SMTP server; or 
ehl 0 ([ host]) for an ESMTP server. The optional parameter host defaults to the 
fully qualified domain name of the local host. The methods return a tuple: resuit 
code (250 for success) and server response string. Because the sendmai 1 method 
can handle the HELO command, you do not normally need to call these methods 
dlrectly. 


Handiing errors 

Methods of an SMTP object may raise the following exceptions if they encounter an 
error: 


SMTPException 

SMTPServerDisconnected 


SMTPResponseException 


SMTPSenderRefused 


Base exception class for all smtplib exceptions. 

The server unexpectedly disconnected, or no 
connection has been made yet. 

Base class for all exceptions that include an 
SMTP error code. An SMTPResponseException 
has two attributes: smtp_code (the response 
code of the error, such as 550 for an invalid 
address) and smtp_error (the error message). 

Sender address refused. The exception 
attribute sender is the invalid sender. 
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SMTPRecipientsRefused All recipient addresses refused. The errors for 

each recipient are accessible through the 
attribute reci pi ents, which is a dictionary of 
exactly the same sort as SMTP . sendmai 1 () 
returns. 

SMTPDataError The SMTP server refused to accept the mes- 

sage data. 

SMTPConnectError An error occurred during establishment of a 

connection with the server. 

SMTPHeloError The server refused a “HELO” message. 

Accessing IMAP accounts 

IMAP is a protocol for accessing mail. Like POP, it enables you to read and delete 
messages. IMAP offers additional features, such as searching for message text and 
organizing messages in separate mailboxes. However, IMAP is harder to use than 
POP, and is far less commonly used. 

See RFC 2060 for the full description of IMAP4rev1. 


The module i tnapl i b provides a class, IMAP4, to serve as an IMAP Client. The 
names of IMAP4 methods correspond to the commands of the IMAP protocol. Most 
methods return a tuple (code, data), where code is “OK” (good) or “NO” (bad), and 
data is the text of the server response. 

The IMAP protocol Includes various magical behaviors. For example, you can move 
all the messages from INBOX into a new mailbox by attempting to rename INBOX. 
(The INBOX folder isn’t actually renamed, but its contents are moved to the other 
mailbox!) Not all the features of the protocol are covered here; consuit RFC 2060 for 
more Information. 

Connection, logon, and logoff 

The IMAP4 constructor takes host and port arguments, which function here just as 
they do for a POP3 object. If you construet an IMAP4 object without specifying a 
host, you must call open(host,port) to connect to a server before you can use 
other methods. The port number defaults to 143. 

To log in, call the method login(user,password). Call 1 ogout to log off. The 
method noop keeps an existing session alive. For example: 

>>> i tnap=itnapl i b. IMAP4( "tnai 1 . tnundotnai 1 . net") 

>>> itnap. 1 ogi n("dutnpl echan","tacos") 

('OK', ['LOGIN completed' ]) 

>>> imap.noop() 

('OK', ['NOOP cotnpl eted ' ]) 
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An IMAP server may use more advanced authentication methods. To authenticate 
in fancier ways, call the method authenti cate (machani sm, handl er ). Here, mech- 
anism is the name of the authentication mechanism, and handler is a function that 
receives challenge strings from the server and returns response strings. (Base64 
encoding is handled internally.) 


Checking, reading, and deleting maii 

Before you can do anything with messages, you must choose a mailbox. The mailbox 
INBOX is always available. To select a mailbox, call select([tnailbox[, 
readonly]]). The parameter mailbox is the mailbox name, which defaults to 
INBOX. If readonly is present and true, then modifications to the mailbox are forbid- 
den. The return value Includes the number of messages in the mailbox. For example: 

>>> imap.select("INBOX") 

COK' , ['2']) 

When finished with a mailbox, call cl ose to close it. 

The method search(charset,criteria. . . ) searches the current mailbox for 
messages satisfying one or more criteria. The parameter charset, if not None, 
specifies a particular character set to use. One or more values can be passed as 
criteria; these are concatenated into one search string. A list of matching message 
indexes is returned. Note that text (other than keywords) in criteria should be 
quoted. For example, the following code checks for messages from the president 
(none today), and then checks for messages whose subject contains “Howdy!” (and 
finds message number 2): 

>>> imap.search(None,"ALL"FROM""president@whi tehouse.gov"') 

(’OK', [None]) 

>>> imap.search(None,"ALL","SUBJECT",'"Howdy!"') 

COK' , ['2' ]) 

To retrieve a message, call fetch(messages,parts). Here, messages is a string 
listing messages, such as " 2", or " 2,7", or "3:5" (for messages 3 through 5). The 
parameter parts should be a parenthesized list of what parts of the message(s) to 
retrieve — for instance, FULL for the entire message, BODY for just the body. For 
example: 

>>> imap.fetch("2","(BODY[text])") 

COK', [('2 (B0DY[text] {13)', 'Howdy cowboy!'), ')', '2 (FLAGS 
(WSEEN))']) 

To change a message’s status, call storefmessages , command , f 1 ags ). Here, com- 
mand is the command to perform, such as " + F LAGS" or "FLAGS". The parameter 
dags is a list of flags to set or remove. For example, the following line of code 
deletes message 2: 

>>> i map.store("2"," + FLAGS",["\Deleted" ]) 

COK', ['2 (FLAGS (WSEEN WDELETED)) ' ]) 
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The method expunge permanently removes all messages marked as deleted by a 
\Del eted flag. Such messages are automatically expunged when you cl ose the 
current mailbox. 

The method copy (messages , newmai 1 box ) copies a set of messages to the mail¬ 
box named newmailbox. 

The method check does a mailbox “checkpoint” operation; wbat this means 
depends on the server. 

You normally operate on messages by index number. However, messages also have 
a unique identifier, or uid. To use uids to name messages, call the method ui d 
(commandname, [args...]). This carries out the command commandname using 
uids instead of message indices. 


Administering mailboxes 

To create a new mailbox, call create (name ).To delete a mailbox, call delete(name). 
Call rename (ol dname, newname ) to rename mailbox oldname to the name newname. 

Mailboxes can contain other mailboxes. For example, the name “nudgenudge/ 
winkwink” indicates a sub-box named “winkwink” inside a master mailbox “nudge¬ 
nudge.” The hlerarchy separator character varies by server; some servers would 
name the mailbox “nudgenudge.winkwink.” 

A mailbox can be marked as subscribed. The effects of subscribing vary by server, 
but generally subscriptions are a way of flagging mailboxes of particular interest. 
Use subseri be (name ) and unsubscri be (name ) to toggle subscription status for 
the mailbox name. 

The command list([root[,pattern]]) finds mailbox names. The parameter root 
is the base of a mailbox hierarchy to list. It defaults to "”(not a blank string, but a 
string of two double-quotes) for the root level. The parameter pattern is a string to 
search for; pattern may contain the wildeards * (matehing anything) and % (mateh- 
ing anything but a hierarchy delimiter). The output of 1 i st is a list of 3-tuples. Each 
tuple corresponds to a mailbox. The first element is a list of flags, such as \Noselect. 
The second element is the server’s hierarchy separator character. The third is the 
mailbox name. 

To list only subscribed mailboxes, use the command lsub([root[,pattern]]). 

For example, the following code creates and lists some mailboxes: 

>>> print imap.list() 
eOK' , ['() "/" "INBOX"']) 

>>> imap.create("xl") 

('OK', ['CREATE completed']) 

>>> imap.create("xl/yl") 

('OK', ['CREATE completed']) 
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>>> i 

ma 

p.create( 

"xl/y2") 




COK' 


['CREATE 

completed ' ]) 




>>> i 

ma 
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"xl/y2","xl/y3") 




COK' 
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>>> i 

ma 

p.1 i st() 





COK' 

, 

['() "/" 

X 

o 

CQ 

xl 

"' , ' ( 

) "/" "xl/yl" 

’’ 

xl 

/y3'"]) 
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"None" 


You can check the status of a mailbox by calling status(tnailbox,natnes). The 
parameter mailbox is the name of a mailbox. The parameter names is a parenthe- 
sized string of status items to check. For example: 

>>> imap.statusCINBOX" ,"(MESSAGES UIDNEXT)") 

COK', ['"INBOX" (MESSAGES 1 UIDNEXT 3)']) 

Other functions 

You can add a message to a mailbox by calling the method appencKtnailbox, 
flags, datet i me, message). Here, mailbox is the name of the mailbox, flags is an 
optional list of message flags, datetime is a timestamp for the message, and message 
is the message text, including headers. 

IMAP uses an INTERNALDATE representation for dates and times. Use the module 
function Internaldate2tuple(date) to translate an INTERNALDATE to a 
TimeTuple, and the function Time2Internaldate(tuple) togo from TimeTuple to 
INTERNALDATE. 

Cross- A See Chapter 13 for more Information aboutthe time module's tuple representation 
Referej^ of time. 

The function ParseFlags(str) splits an IMAP4 FLAGS response into a tuple of flags. 


Handiing errors 

The class IMAP4. error is the exception raised by any errors using an 1MAP4 object. 
The error argument is an error message string. It has subclasses IMAP4.abort (raised 
for server errors) and IMAP4 . readonly (raised if the server changed a mailbox 
while you were reading mail, and you must re-open the mailbox). 
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Transferring Files via FTP 

The module ftpl i b provides the class FTP, which serves as an FTP Client. The 
Python source distribution includes a script, Tools/script/ftpmirror.py, that 
uses ftplib to mirror an FTP site. 

See RFC 959 for more on the FTP protocol. 



l-ogging in and out 

The FTP constructor takes several optional parameters. A call to FTP([host[, 
user[, passwordC, acct] ] ] ]) constructs and returns an FTP object. The con¬ 
structor also connects to the specified host if host is supplied. If user is supplied, 
the constructor logs in using the user user, the password password, and the 
account acct. 

You can also connect to a host by calling the FTP method connectChostnatne 
[, port] ). The port number defaults to 21; you will probably never need to set it 
manually. You can log in by calling login([user[,password[,acct]]]).lf user is 
not specified, anonymous login is performed. The following two examples demon¬ 
strate the long and short way to log on to a server: 

>>> # long way: 

>>> session=ftplib.FTP() 

>>> session.connect("gianth.com") 

'220 gianth Microsoft FTP Service (Version 5.0).' 

>>> session. 1 ogin() # anonymous login (login string returned) 
'230-Niao! Greetings from Giant FI Laboratories! \015\012230 
Anonymous user logged in.' 

>>> # short way: 

>>> session2=ftpl ib.FTP("gianth.com","anonymous","bob@aol .com") 

The method getwel come returns the server welcome string (the same string 
returned by connect). 

When finished with an FTP connection, call qui t or cl ose. (The only difference 
between the two is that quit sends a “polite” response to the server.) 


Navigating 

The method pwd returns the current path on the server. The method cwd(path) 
sets the path on the server. You can call mkd(path) to create a new directory; call 
rmd(di rname ) to delete an empty directory. 
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The method nlst([dir[,args]]) returns directory contents as a list of file 
names. By default, both functions list the current directory; pass a different path in 
dir to list a different one. Extra string arguments are passed along to the server. The 
function dir([dir[,args]]) gets a list of files for processing. If the last argument 
to di r is a function, that function is used as a callback when retrieving each line 
(see retrl i nes, in the next section); the default processor simply prints each Une. 

The method size(filenatne) returns the size of a particular file. You can delete a 
file with delete(filenatne), and rename a file by calling renatneColdnatne, 
newnatne). 


Transferring files 

To store (upload) a file, call st orbi na ry (comma nd , f i 1 e, bl ocksi ze) for blnary 
files, or storl i nes (command , f i 1 e) for plain text files. The parameter command is 
the command passed to the server. The parameter fUe should be an opened file 
object. The storbinary parameter blocksize is the block size for data transfer. For 
example, the following code uploads a sound file to a server in 8K blocks, and then 
verifies that the file exists on the server: 

>>> Source=open("c:\\SummerRain.mp3") 

>>> Session.storbinary("STOR SummerRain.mp3",Source,8192) 

'226 Transfer complete.' 

>>> Session.nlst().index("SummerRain.mp3") 

To retrieve (download) a file, call retrbi nary (command ,callback[,blocksize 
[,rest]]) or retrl i nes (command [,callback]). The parameter command is the 
command passed to the server. The parameter callback is a function to be called 
once for each block of data received. Pytbon passes the block of data to the call¬ 
back function. (The default callback for retrl i nes simply prints each line.) The 
parameter blocksize is the maximum size of each block. Supply a byte position for 
rest to continue a download part way through a file. For example, the following code 
retrieves a file from the server to a file: 

>>> destination=open("foo.mp3","w") 

>>> session.retrbinary("RETR SummerRain.mp3",dest.write) 

'226 Transfer complete.' 

>>> destination . close() 

A lower-level method for file transfer is ntransfercmd( commandf, rest] ), which 
returns a 2-tuple: a Socket object and the expected file size in bytes. The method 
transfercmd ( command[, rest] ) is the same as ntransfercmd, but returns only a 
Socket object. 

The method abort cancels a transfer in progress. 


Other methods 

The method set_pasv (val ue ) sets passive mode to value. If value is true, the 
PASV command is sent to the server for file transfers; otherwise, the PORT 
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command is used. (As of Python Version 2.1, passive mode is on by default; in previ- 
ous versions, passive mode was not on by default.) 

The method s e t_d ebuglevel (level ) sets the level of debug output from ftplib — 
0 (the default level) produces no debug output; 2 is the most verbose. 

Handiing errors 

The module defines several exceptions: error_repl y is raised when the server 
unexpectedly sends a response; errorjemp is raised for “temporary errors” (with 
error codes in the range 400-499); error_perrri is raised for “permanent errors” 
(with error codes in the range 500-599); and error_proto is raised for errors with 
unknown error codes. 


Using netrc files 

The supporting module netrc is used to parse .netrc files. These files cache user 
Information for various FTP servers, so that you don’t need to send it to the host by 
hand each time. They can also store macros. 

The module provides a class, netrc, for accessing netrc contents. The constructor 
netrc([filenarrie]) builds a netrc object by parsing the specified file. If filename 
is not provided, it defaults to the file .netrc in your horne directory. 

The attribute hosts is a dictionary mapping from host names to authentication 

information of the form (username, account, password). If the parsed .netrc file 

includes a default entry, it is stored in hosts[" default"]. The attribute macros is 

a dictionary, mapping macro names to string lists. The method 

authenti cators ( hostname) returns either the authentication tuple for hostname, 

the default tuple (if there is no tuple for hostname), or (if there is no default either) 

None. 

The netrc class implements a_ retr _method that returns .netrc file contents. 

This means that you can edit an existing file. For example, the following code adds 
(or overrides) an entry on disk: 

MyNetrc=netrc.netrc(".netrc") 

MyNetrc.hosts["ftp.oracle.com"] = ("stanner","","weebl e") 

NetrcFi1e=open(".netrc") 

NetrcFile.write(repr(MyNetrc)) 

NetrcFi1 e.close() 


Retrieving Resources Using Gopher 

Gopher is a protocol for transferring hypertext and multimedia over the Internet. 
With the rise of the World Wide Web, Gopher is no longer widely used. However, the 
urllib module supports it, and the gopherl i b module supports gopher requests. 
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See RFC 1436 for the definitiori of the Gopher protocol. 


The function send_sel ector (sel ector,host[,port]) sends a se/ector (analo- 
gous to a URL) to the specified host. It returns an open file object that you can read 
from. The port-number parameter, port, defaults to 70. For example, the following 
code retrieves and prints the Gopher Manifesto: 

Manifesto=gopherl ib.send_selector( 

"0/the gopher mani festo.txt","gopher.heatdeath.org") 
print Mani festo.read() 

The function send_query (selector , query, host[, port] ) is similar to 
send_selector, but sends the query string query to the server along with the selector. 


Working with Newsgroups 

NetWork News Transport Protocol, or NNTP, is used to carry the traffic of newsgroups 
such as comp.lang.python. The module nntpl i b provides a class, NNTP, which is a 
simple NNTP Client. It can connect to a news server and search, retrieve, and post 
articles. 

See RFC 977 for the full definition of NNTP. 


Most methods of an NNTP object return a tuple, of which the first element is the 
server response string. The string begins with a three-digit status code. 

Dates in nntpl i b are handled as strings of the form yymmdd, and times are han- 
dled as strings of the form hhmmss. The two-digit year is assumed to be the year 
closest to the present, and the time zone assumed is that of the news server. 

Articles are identified in two ways. Articles are assigned numeric article numbers 
within a group in ascending order. Each article also has a unique message-id, a 
magic bracketed string unique across all articles in all newsgroups. For instance: 
An article cross-posted to rec.org.mensa and alt.religion.kibology might be article 
number 200 in rec.org.mensa, article number 500 in alt.religion.kibology, and have 
message-id <mwb06.162488$e5.131709@newsfeeds . bi gpond . com>. 

Some methods are not available on all news servers—the names of these methods 
begin with x (for “extension”). 

Connecting and logging in 

The constructor syntax isNNTP(host[,port[,user[,password 
[,readermode]]]]). Here, host is the news server’s host name. The port number, 
port, defaults to 119. If the server requires authentication, pass a username and 
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password in the user and password parameters. If you are connecting to a news 
server on the local host, pass a non-null value for readermode. 

Once connected, the getwel come method returns the server’s welcome message. 
When you are finished with the connection, call the quit method to disconnect 
from the server. 


Browsing groups 

To select a particular newsgroup, call the method group(name).The method 
returns a tuple of strings (response,count,first,last,name). Here, count is the approx¬ 
imate number of messages in the group, first and last are the first and last article 
numbers, and name is the group name. 


The method 1 i st examines the newsgroups avallable on the server. It returns a tuple 
(response,grouplist), where response is the server response. The list grouplist has one 
element per newsgroup. Each entry is a tuple of the form (name,last,first,postable). 
Here, name is the name of the newsgroup, last is the last article number, and first is 
the first article number. The flag postable is either “y” if posting is allowed, “n” if post- 
ing is forbidden, or “m” if the group is moderated. 



There are thousands of newsgroups out there. Retrieving a list usually takes sev- 
eral minutes. You may want to take a snack break when you call the list 
method! 


The following code finds all newsgroups with “fish” in their name: 

GroupList=news.list()[l] 

print fi 1ter(1ambda x:x[0].find("fish")! = -l,GroupList) 

New newsgroups appear on USENET constantly. The method newgroups 
(date , time ) returns all newsgroups created since the specified date and time, in 
the same format as the llsting from list. 


Browsing articles 

New news is good news. The method newnews(name, date , time ) finds articles 
posted after the specified moment on the group name. It returns a tuple of the form 
(response, idlist), where idlist is a list of message-ids. 

Once you have entered a group by calling group, you are “pointing at” the first arti¬ 
cle. You can move through the articles in the group by calling the methods next 
and last. These navigate to the next and the previous article, respectively. They 
then return a tuple of the form (response,number,id), where number is the current 
article number, and id is its message-id. 

The method stat ( i d ) checks the status of an article. Here, id is either an article 
number (as a string) or a message-id. It returns the same output as next or 1 ast. 
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On most news servers, you can scan article headers to find the messages you want. 
Call the method xhdrdieader, articles)to retrieve the values of a header speci- 
fied by header. The parameter artides should specify an article range of the form 
first-last. The returned value is a tuple (response, headerlist). The entries in header- 
list have the form (id, text), where id is the message-id of an article, and text is its 
value for the specified header. For instance, the following code retrieves subjects 
for artides 319000 through 319005, inclusive: 

>>> news.xhdrCsubject","319000-319005") 

('221 subject fields follow', [('319000', 'Re: I heartily 
endorse: Sinfest!'), ('319001', 'Re: Dr. Teg'), ('319002', 'Re: 

If you be my bodyguard'), ('319003', 'Re: Culture shock'), 
('319004', 'Re: Dr. Teg'), ('319005', 'Todays lesson')]) 

The method xover(start,end) gathers more detailed header Information for 
artides in the range [start,end]. It returns a tuple of the form (response, articlelist). 
There is one element in the list articlelist for each article. Each such entry contains 
header values in a tuple of the form (article number, subject, poster, date, message- 
id, references, size, lines). 

The method xgti tl e( name ) finds all the newsgroups matching the specified name 
name, which can include wildcards. It returns a tuple of the form (response, grou- 
plist). Each element of grouplist takes the form (name, description). Eor example, 
here is another (much faster) way to search for groups that talk about fish: 

print news.xgtitle("*fish*") 

Reading artides 

The method a rti cl e ( i d ) retrieves the article with the specified id. It returns a 
tuple of the form (response, number, id, llnelist), where number is the article num¬ 
ber, id is its message-id, and linelist is a list whose elements are the lines of text of 
the article. The text in linelist includes all its headers. The method head ( i d ) and 
body (id) retrieve head and body, respectively. 

Eor example, the simple code in Listing 16-3 dumps all artides by a particular 
poster on a newsgroup into one long file: 


Listing 16-3: NewsSiurp.py 


import nntplib 
import sys 

def durrip_arti cles(news,Target Group,Target Poster): 
GroupInfo=news.group(TargetGroup) 

ArticleList=news.xhdr("frorri",GroupInfo[2]+"-"+GroupInfo[3]) 

dutnpfile = open ("newsfeed . txt", "w") 
for ArticleTuple in Articlelist: 
(MessageID,Poster)=ArticleTuple 
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if (Poster.find(TargetPoster)!=-l): 

ArticleText=news.body(MessageID)[3] 
for ArticleLine in ArticleText: 

dumpf i 1 e.write(ArticleLine+"\n") 
dumpfi 1 e.f1ush() 
dumpfi 1 e.close() 

news=nntplib.NNTP("news .fastpointcom.com") 

dump_articles(news,"alt.rei igion.kibology","kibo@worl d.std.com" 
) 


Posting aiticles 

The method post (fi 1 e ) posts, as a new article, the text read from the file object 
file. The file text should include the appropriate headers. 

The method i ha ve( i d, f i 1 e ) informs the server that you have an article whose 
message-id is id. If the server requests the article, it is posted from the specified file. 

Other functions 

The helper method date returns a tuple of the form (response, date, time), where 
date and time are of the form yymmdd and mmhhss, respectively. It is not available 
on all news servers. 

Call set_debug( 1 evel ) to set the logging level for an NNTP object. The default, 0, 
is silent; 2 is the most verbose. 

The method hei p returns a tuple of the form (response, helplines), where helplines 
is the server help text in the form of a list of strings. Server help is generally not 
especially helpful, but may list the extended commands that are available. 

Call the sl ave method to inform the news server that your session is a helper 
(or “slave”) news server, and return the response. This notification generally has no 
special effect. 

Handiing errors 

An NNTP object raises various exceptlons when things go horribly wrong. NNTPError 
is the base class for all exceptions raised by nntplib. NNTPReply is raised if the server 
unexpectedly sends a reply. For error codes in the range of 400-499 (for example, 
calling next without selecting a newsgroup), NNTPTemporaryError is raised. For 
error codes in the range of 500-599 (for example, passing a bogus header to xhdr), 
NNTPPermanentError is raised. For unknown error codes, NNTPProtocol Error is 
raised. Finally, NNTPDataErroris raised for bogus response data. 
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Using the Teinet ProtocoI 

The Teinet protocol is used for remote access to a server. Teinet is quite low-level, 
only a little more abstract than using Socket directly. For example, you can (if you 
are a masochistic) read USENET by telnetting to port 119 and entering NNTP com- 
mands by hand. 

See RFC 854 for a definition of the Teinet protocol. 


The module telnetlib defines a class, Teinet, which you can use to handle a Teinet 
connection to a remote host. 

Connecting 

The Teinet constructor has the syntax Telnet([host[,port]]).lf you pass a host 
name in the parameter host, a session will be opened to the host. The port number, 
optionally passed via the parameter port, defaults to 23. If you don’t connect when 
constructing the object, you can connect by calling open(host[,port]). Once you 
are finished with a session, call the cl ose method to terminate it. 

Note After establishing a connection, do not call the open method again for the same 
" Teinet object. 

Reading and writing 

You can run a simple Teinet Client (reading from stdi n and printing server 
responses to stdout) by calling the i nteract method. The method mti nteract is 
a multithreaded version of i nteract. For example, the following lines would con¬ 
nect you to an Online MUD (Multi-User Dungeon) game: 

>>> 1 ink=telneti i b .Tei net("materiamagica.com" ,4000) 

>>> 1 ink.interact() 

Writing data is simple: To send data to the server, call the method write(string). 
Special lAC (Interpret As Command) characters such as chr (255) are escaped 
(doubled). 

Reading data from the server is a bit more complicated. The Teinet object keeps a 
buffer of data read so far from the server; each read method accesses buffered (or 
“cooked”) data before reading more from the server. Each returns data read as a 
(possibly empty) string. The following read methods are available: 

read_all — Read until EOE. Block until the server closes the connection. 

-f read_some—Read at least one character (unless EOF is reached). Block if 
data is not immediately available. 
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-f read_very_eager — Read all available data, without blocking unless in the mid- 
dle of a command sequence. 

read_eager — Same as read_very_eager, but does not read more from the 
server if cooked data is available. 

-f readjazy — Reads all cooked data. Does not block unless in the middle of a 
command sequence. 

-f read_very_lazy — Reads all cooked data. Never blocks. 

The read methods, except read_al 1 and read_sotne, raise an EOFError if the con- 
nection Is closed and no data is buffered. For example, If you use read_very_lazy 
exclusively for reading, the only way to be certain the server is finished is if an 
EOFError is raised. For most purposes, you can just call read_sotne and ignore the 
other methods. 

For example, the following code connects to port 7 (the echo port) and talks to 
itself: 

echo=telnetlib.Telnet("gianth.cotn",7) 

echo.write("Hei 1o!") 

print echo.read_very_eager() 

Watching and waiting 

The method read_unti 1 (expected[ .timeout] ) reads from the server until it 
encounters the string expected, or until timeout seconds have passed. If timeout is 
not supplied, it waits Indefinitely. The method returns whatever data was read, pos- 
sibly the empty string. It raises EOFError if the connection is closed and no data is 
buffered. 

A more powerful method expect(targets[, timeout]) watches for a list of 
strings or regular expression objects, provided in the parameter targets. It returns 
a tuple of the form (matchindex, match, text), where matchindex is the index (in 
targets) of the first matched item, match is a match object, and text is the text read 
up to and including the match. If no match was found, matchindex is -1, match is 
None, and text is the text read, if any. 

Other methods 

The method set_debug( 1 evel ) sets the level of debug logging. A level of 0 (the 
default) is silent; level 2 is the most verbose. 

The method get_socket returns the Socket object used internally by a Telnet 
object. The method fi 1 eno returns the file descriptor of the Socket object. 
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Writing CCI Scripts 

Many Web pages respond to input from the user — these pages range from simple 
feedback forms to sophisticated shopping Web sites. Common Gateway Interface 
(CGI) is a Standard way for the Web server to pass user input Into a script. The 
module egi enables you to build Python modules to handle user requests to your 
Web site. 

Your CGI script should output headers, a blank line, and then content. The one 
required header is Content-type, and its usual value is “text/html.” For example, 
Listing 16-4 is a very simple CGI script, which returns a static Web page: 


Listing 16-4: HelloWorId.py 


# (add #! line here under UNIX, or if using Apache on Windows) 

import egi 

# Part 1: ContentType header, followed by a blank line 

# to indicate the end of the headers. 

print "Content-Type: text/html\n" 

# Part 2: A simple HTML page 

print "<title>Gumby</title>" 

print "<html><body>My brain hurts!</body></html> 


Setting up CGI Scripts 

Making your Web server run a script is half the battle. In general, you must do the 
following: 

1. Put the script in the right place. 

2. Make it executable. 

3. Make it exeeute properly. 

Configuration detalls vary by Web server and operating system, but the following 
sections provide Information for some common cases. 

Windows Internet Information Server (IIS) 

First, create a directory (below your root Web directory) for CGI files. A common 
name is c g i - b i n. 

Next, bring up the Internet Services Manager — in Windows 2000, go to StartO 
Control Panel O Administrative Tools O Internet Services Manager. 

In Internet Services Manager, edit the properties of the CGI directory. In the 
Application section, click Configuration... (if Configuration is disabled, click 
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Add first). This brings up the Application Configuration dialog. On the App 
Mappings tab, add an entry mapping the extension . py to python . exe u %s %s. 
The -u setting makes Python run in unbuffered binary mode. The %s %s ensures that 
IIS runs your script (and not just an instance of the interpreter!). 

UNIX 

Put your Scripts in the appropriate CGI directory, probably egi bi n. Make sure the 
script is executable by everyone (chtnod 077 seri pt. py). In addition, make sure any 
files it reads or writes are accessible by everyone. To make sure the script is exeeuted 
as a python script, add a “pound-bang” line to the very top of the script, as follows: 

#!/usr/local/bin/python 

Apache (any operating system) 

To set up a CGI directory under Apache, add a ScriptAlias line to httpd.conf that 
points at the directory. In addition, make sure there is a <Directory> entry for that 
folder, and that it permits exeeution. For example, my configuration file includes the 
following lines: 

ScriptAlias /egi-bin/ "C:/Webroot/egi-bin/" 

<Directory "C:/Webroot/cgi-bin"> 

AI 1owOverride None 
Options None 
</Directory> 

Apache uses the “pound-bang hack” to decide how to exeeute CGI Scripts, even on 
Windows. For example, I use the following simple test script to test CGI under 
Apache: 

#! python 
itnport egi 

egi.teste) # the test function exercises many CGI features 

Accessing form fields 

To access form fields, instantiate one (and only one) egi . Fi el dStorage object. 
The master FieldStorage object can be used like a dictionary. Its keys are the sub- 
mitted field names. Its values are also FieldStorage objects. (Actually, if there are 
multiple values for a field, then its corresponding value is a list of FieldStorage 
objects.) 

The FieldStorage object for an individual field has a va 1 ue attribute containing the 
field’s value as a strlng. It also has a name attribute containing the field name (pos- 
sibly None). 

For example, the script in Listing 16-5 (and its corresponding Web page) gathers 
and e-mails site feedback. Listing 16-6 is a Web page that uses the script to handle 
form input. 
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Listing 16-5: Feedback.py 


#!python 

import egi 
import smtplib 
import sys 
import traceback 

# Set these e-mail addresses appropriately 

SOURCE_ADDRESS="robot_form@gi anth.com" 

FEEDBACK_ADDRESS="dumplechan@seanbaby.com" 

sys.stderr = sys.stdout 

print "Content-Type: text/html\n" 

try: 

fields=cgi.FieldStorage() 

if (fields.has_key("name") and f i el ds.has_key("comments")): 
UserName=fields["name"].val ue 
Comments=fields["comments"].val ue 

# Mail the feedback: 

Mai1box=smtplib.SMTP("mai1.seanbaby.com") 

MessageText="From: <"+SOURCE_ADDRESS+">\r\n" 
MessageText+="To: "+FEEDBACK_ADDRESS+"\r\n" 
MessageText+="Subject: Feedback\r\n\r\n" 

MessageText+="Feedback from "+UserName+":\r\n"+Comments 
Mailbox.sendmail (SOURCE_ADDRESS, FEEDBACK_ADDRESS, 
MessageText) 

# Print a simple thankyou page: 

print "<hl>Thanks!</hl>Thank you for your feedback!" 
el se: 

# They must have left "name" and/or "comments" blank: 

print "<hl>Sorry...</hl>" 

print "You must provide a name and some comments too!" 

except: 

# Print the traceback to the response page, for debugging! 

print "\n\n<PRE>" 
traceback.print_exc() 


Listing 16-6: Feedback.html 


<html> 

<titl e>Feedback form</title> 

<hl>Submit your comments</hl> 

<form action="cgi-bin/Feedback.py" method="POST"> 
Your name: <input type="text" size="35" name="name"> 
<br> 
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Comments: <br> 

<textarea nanie="cotntnents" rows="5" coi s="35"></textarea> 
<input type="subtnit" val ue="Send! "> 

<fortn> 

</html> 


Advanced CGI functions 

You can retrieve field values directly from the master FieldStorage object by calling 
the method getvalue(fieldnarrie[,default]).lt returns the value of field field- 
name, or (if no value is available) the value default. If not supplied, default is None. If 
there are multiple values for a field, getval ue returns a list of strings. 

If a field value Is actually a file, accessing the value attribute of the corresponding 
FieldStorage object returns the file’s contents as one long string. In this case, the 
fi 1 ename attribute is set to the file’s name (as given by the Client), and the fi 1 e 
attribute is an opened file object. 

A FieldStorage object provides some other attributes: 

type — Content-type as a string (or None if unspecified) 

♦ type_opti ons —Dictionary of options passed with the content-type header 

4 - dispositi on —Content-disposition as a string (or None if unspecified) 

di sposi ti on_opti ons —Dictionary of options passed with the content- 
disposition header 

headers — Map of all headers and their values 

A note on debugging 

Debugging CGI Scripts can be difficult, because the traceback from a crashed script 
may be buried deep in the bowels of the Web server’s logging. Listing 16-7 uses a 
trick to make debugging easier. 


Listing 16-7: CGIDebug.py 


import sys 

import traceback 

sys.stderr = sys.stdout 

print "Content-Type: text/html\n" 

try: 

# The script body goes here! 

except: 

print "\n\n<PRE>" 
traceback.pri nt_exc() 
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Pointing stderr at stdout means that the output of pri nt_exc goes to the resulting 
Web page. The <PRE> tag ensures that the text is shown exactly as printed. 

A note on security 

Internet security is crucial, even for casual users and simple sites. A common 
vulnerability is a CGI script that executes a command string passed from a Web 
request. Therefore, avoid passing user-supplied values to os . System, or accessing 
file names derived from user data. Remember that hidden fields on forms are hid- 
den for presentation purposes only — enterprising users can see and manipulate 
their values. 

For a good introduction to Web security, see the World Wide Web Consortium’s 
security FAQ at http : //www .w3.org/Secur ity/Faq/www-security-faq.html. 


Summary 

Python provides simple Client implementations of many Internet protocols. Python 
also makes a great CGI scripting language. In this chapter, you: 

-f Sent and received e-mail. 

-f Retrieved Web pages and files in various ways. 

Created a Web page with a simple feedback form. 

In the next chapter, you will meet various modules that help handle many flavors of 
Internet data. 


Handiing 
Internet Data 


C H 


APTE 




I nternet data takes many forms. You may find yourself 
working with e-mail messages, mailboxes, cookies, URLs, 
and more. Python’s libraries include helper modules for han- 
dling thls data. This chapter Introduces modules to help han- 
dle several common tasks in Internet programming — 
handling URLs, sending e-mall, handling cookies from the 
World Wide Web, and more. 


Manipulating URLs 

A Uniform Resource Locator (URL) is a string that serves as the 
address of a resource on the Internet. The module uri par se 
provides functions to make it easier to manipulate URLs. 

The function 

urlparse(url[,default_schetne[,al 1ow_fragments]]) 

parses the string uri, splitting the URL into a tuple of the form 
(scheme, host, path, parameters, query, fragment). For example: 


>>> URLStr1ng="http://f1nance.yahoo.coiti/q?s=MWT&d=vl" 

>>> print urlparse.urlparse(URLString) 

('http', 'finance.yahoo.com', '/q', '', 's=MWT&d=vl' , '') 

The optional parameter default_scheme speclfies an address- 
ing scheme to use if none is specified. For example, the follow- 
ing code parses a URL with and without a default scheme: 


> ♦ ♦ ♦ 
In This Chapter 

Manipulating URLs 

Formatting text 

Reading Web spider 
robot files 

Viewing files in a 
Web browser 

Dissecting e-mail 
messages 

Working with MIME 
encoding 

Encoding and 
decoding message 
data 

Working with UNIX 
mailboxes 

Using Web cookies 

♦ ♦ ♦ ♦ 


>>> URLString="//gianth.com/stuff/junk/DestroyTheWorld.exe" 

>>> print urlparse.urlparse(URLString) # no scheme! 

('', 'gianth.com', ' /stuff/junk/DestroyTheWorld.exe', '', '', '') 
>>> print uri parse. uri parsedJRLStri ng, "ftp") 

('ftp', 'gianth.com', '/stuff/junk/DestroyTheWorld.exe', 


) 
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The parameter allowjragments defaults to true. If set to false, no fragments are 
permitted in the parsed URL: 

>>> URLString = "http: //www.penny-arcade.corri/#food" 

>>> print uri parse. uri parse (" URLStn'ng") 

('http', ’ WWW. penny-arcade. cotn' , 'food') 

>>> print uri parse.uriparse("URLString",None,0) 

('http', 'www.penny-arcade.cotn', '/#food', '', '', '') 

The function urlunparse(tuple) unparses a tuple back into a URL string. 

Parsing and then unparsing yields a URL string that is equivalent (and quite pos- 
sibly identical) to the original. 

The function urljoin(base, url[,all ow_f ragments]) merges a base URL (base) 
with a new URL (uri) to create a new URL string. tt is useful for processing anchors 
when parsing HTML. For example: 

>>> CurrentPage="http://gianth.com/stuff/junk/i ndex.html" 

>>> print uri parse.uri joi n(CurrentPage/f 00 .html") 
http://gianth.com/foo.httnl 

The parameter allow_fragments has the same usage as uri parse. 



The module urilib includes functions to encode strings as valid URL components. 
See "Manipulating URLs" in Chapter 16. 


Formatting Text 


The f ormatter module defines interfaces for formatters and wrlters. A formatter 
handles requests for various kinds of text formatting, such as fonts and margins. It 
passes formatting requests along to a writer. In particular, it keeps a stack of fonts 
and margins, so that they know which settlngs to revert to after turning off the “cur¬ 
rent” font or margins. Formatters and writers are useful for translating text between 
formats, or for displaying formatted text. They are used by html 1 i b . HTMLParser. 


Formatter interface 


The formatter attribute writer is the writer object corresponding to the formatter. 

Writing text 

The methods add_f 1 owi ng_data (text ) and add_l iteral_data(text) both 
send text to the writer. The difference between the two is that add_fl owi ng_data 
collapses extra whltespace; whitespace is held in the formatter before being passed 
to the writer. The method f 1 ush_softspace clears buffered whitespace from the 
formatter. 
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The method add_l abel_data (format, counter ) sends label text (as used in a 
list) to the writer. If format is a string, it is used to format the numeric value counter 
(in a numbered list). Otherwise, format is passed along to the writer directly. 

If you manipulate the writer directly, call f 1 ush_softspace beforehand, and call 
assert_line_data([flag]) after adding any text. The parameter flag, which 
defaults to 1, should be true if the added data finished with a line break. 

Spacing, margins, and alignment 

The method set_spacing(spaces) sets the desired line spacing to lines. 

The methods push_al i gnmentfal i gn ) and pop_al i gnment set and restore 
alignment. Here, align is normally left, right, center, justify (full), or None (default). 

The methods push_margi n (name ) and pop_margi n increase and decrease the 
current level of indentation; the parameter name is a name for the new indentation 
level. The initial margin level is 0; ali other margin levels must have names that eval- 
uate to true. 

The method add_l i ne_breal< adds a line break (at most, one in succession), but 
does not finish the current paragraph. The method end_paragraph(lines) ends 
the current paragraph and inserts at least lines blank lines. Finally, the method 
add_hor_rul e adds a horlzontal rule; Its parameters are formatter- and writer- 
dependent, and are passed along to the writer’s method send_l i ne_break. 

Fonts and styles 

The method push_font( font ) pushes a new font definition, font, of the form 
(size,italics,bold,teletype). Values set to formatter. AS_I S are left unchanged. The 
new font is passed to the writer’s new_f ont method. The method pop_f ont 
restores the previous font. 

The method push_style(*styles) passes any number of style definltions. A tuple 
of ali style definltions is passed to the writer’s method new_styl es. The method 
pop_style([count]) pops count styles (by default, 1), and passes the revlsed 
stack to n ew_s ty 1 e s. 

Writer interface 

Writers provide various methods to print or dlsplay text. Normally, the formatter 
calls these methods, but a caller can access the writer directly. 

Writing text 

The methods send_f 1 owi ng_data (text ) and send_l iteral_data(text) both 
output text. The difference between the two is that send_l i teral_data sends 
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pre-formatted text, whereas send_f 1 owi ng_data sends text with redundant 
whitespace collapsed. The method send_l abel_data(text) sends text intended 
for a list label; it is called only at the beginning of a line. 

The method f 1 us h is called to flush any buffered output. 


Spacing, margins, and alignment 

The method send_l i ne_break breaks the current line. The method send_ 
paragraph(lines) is called to end the current paragraph and send at least lines 
blank lines. The method set_spaci ng (1 i nes ) sets the level of line spacing to 
lines. The method send_hor_rul e is called to add a horizontal rule; its arguments 
are formatter- and writer-dependent. 

The method new_tTiargi nCname,level ) sets the margin level to level, where the 
indentation levehs name is name. 

The method new_al i gnmentC al i gn) sets line alignment. Here, align is normally 
left, right, center, justify (full), or None (default). 

Fonts and styles 

The method new_font (font) sets the current font to font, where fontis either None 
(indlcating default font), or a tuple of the form (size,ltalic,bold,teletype). 

The method new_styl es (sty 1 es) is called to set new style(s); pass a tuple of new 
style values in styles. 

Other module resources 

The AbstractFormatter is a simple formatter that you can use for most applica- 
tions. The Nui 1 Formatter is a trivial implementation of the formatter interface — it 
has ali the available methods, but they do nothing. It is useful for creating an 
HTTPParser that does not format Web pages. 

The NullWriter is a writer that does nothing. The AbstractWriter is useful for 
debugging formatters; method calls are simply logged. The DumbWriter is a simple 
writer that outputs word-wrapped text. Its constructor has the syntax DumbWriter 
([f i 1 e[ ,maxcol ] ]). Here, file is an open filelike object for output (if none is 
specified, text is written to Standard output); and maxcol (whlch defaults to 72) is 
the maximum width, in characters, of a line. For example, this function prints a 
text-only version of a Web page: 

import htmllib 
import urllib 
import formatter 

def PrintTextPage(URL): 

URLFile = uri 1 ib.uriopen(URL) 

HTML = URLFile.readO 
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URLFi1 e.close() 
parser=httnl 1 i b. HTMLParser ( 

formatter.AbstractFormatter(formatter.DumbWriter())) 
parser. feed (FITML) 


Reading Web Spider Robot Files 

A robot is a program that automatically browses the Web. For example, a script 
could programmatically check CD prices at several online sites in order to find the 
best price. Some Webmasters would prefer that robots not visit their Systems. 
Therefore, a well-behaved robot should check a hosfs Web root for a file named 
robots.txt, which specifies any URLs that are off-limits. 

The module robotparser provides a class, RobotFi 1 eParser, which makes it easy 
to parse robots.txt. Once you instantiate a RobotFi leParser, call its 
set_url (uri ) to point it at the robots . txt file at the specified URL uri. Then, 
call its re ad method to parse the file. Before retrieving a URL, call 
can_fetch (useragent, uri ) to determlne whether the specified URL is allowed. 
The parameter useragent should be the name of your robot program. For example, 
Listing 17-1 tests a “polite get” of a URL: 


Listing 17-1: PoIiteGetpy 


import robotparser 
import urlparse 
import urllib 

def Pol iteGet(uri): 

.Return an open uri-file, or None if URL is forbidden. 

RoboBuddy=robotparser.RobotFileParser() 

# Grab the host-name from the URL: 
URLTuple=urlparse.urlparse(url ) 

RobotURL="http:/ /"-(-URLTupl e[l] + "/ robots . txt" 

RoboBuddy.set_url (RobotURL) 

RoboBuddy.read() 

if RoboBuddy.can_fetch("I,Robot", uri ): 

return uri 1 ib.uriopen(uri) 
el se: 

return None 

if (_name_=="_main_"): 

URL="http: //WWW . nexor.com/cg i-bin/rfcsearch/location?2449" 
print "Forbidden:",(Pol iteGet(URL)==None) 

URL="http: //WWW .yahoo.com/r/sq" 
print "AI 1owed:",(Pol iteGet(URL)==None) 
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You can manually pass a list of robots.txt lines toa RobotFileParser by calling 
the method parse( 1 i nes). 

If your parser runs for many days or weeks, you may want to re-read robots . txt 
periodically. RobotFi 1 eParser keeps a “last updated” timestamp. Call the method 
modi f i ed to set the timestamp to the current time. (This is done automatically 
when you call readorparse.) Call mtime to retrieve the timestamp, in ticks. 


Viewing Files in a Web Browser 

The module webbrowser provides a handy interface for opening URLs in a browser. 
The function open(url[,new]) opens the specified URL using the default browser. 
If the parameter new is true, a new browser window is opened if possible. The func¬ 
tion open_new(uri ) is a synonym for open (uri , 1). 

Normally, pages are displayed in their own window. However, on UNIX Systems for 
which no graphical browser is available, a text browser will be opened (and the 
program will block until the browser session is closed). 

If you want to open a particular browser, call the function registerCname, 
class[,instance]). Here, name is one of the names shown in Table 17-1, and either 
class is the corresponding class, or instance is an instance of the corresponding class. 


Table 17-1 

Available Browsers 

Name 

Class 

Platform 

netscape 

Netscape 

AII 

kfm 

Konquerer 

UNIX 

grail 

Grail 

AII 

windows-default 

WindowsDefault 

Windows 

internet-config 

InternetConfig 

Macintosh 

command-line 

CommandLineBrowser 

AII 


Once a browser is registered, you can call get( name ) to retrieve a controller for it. 
The controller provides open and open_new methods similar to the functions of 
the same names. For example, the following code asks for the Grail browser by 
name, and then uses it to view a page: 

>>> Webbrowser.registerC'grail".Webbrowser.Grail) 

>>> Controller=Webbrowser.get("grail") 

>>> Controller.open("www .python.org") 
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Dissecting E-Mail Messages 

E-mail messages have headers with a Standard syntax. The syntax, described in RFC 
822, is a bit complicated. Fortunately, the module rf c822 can parse these headers 
for you. It also provides a class to help handle llsts of addresses. 

Parsing a message 

To parse a message, call the constructor Message(file[,seekable]). Here, file is 
an open file. The file is parsed, and all headers are matched case-insensitively. 

The file parameter can be any filellke object with areadlines method; it must also 
have seek and teli methods in order for Message. rewi ndbody to work. If file is 
unseekable (for example, it wraps a Socket), set seekable to 0 for maximum portability. 


Retrieving header values 

The method get(natne[,default]) returns the /asf value of header name, or default 
(by default, None) if no value was found. Leading and traillng whitespace is trimmed 
from the header; newlines are removed if the header takes up multiple lines. The 
method getheader is a synonym for get. The method getrawheader(natne) 
returns the first header name with whitespace (including trailing linefeed) intact, or 
None if the header was not found. 

If a header can have multiple values, you can use getalltnatchingheaders(natne) 
to retrieve a (raw) list of all header lines matching name. The method 
getfi rstmatchi ngheaderf name) returns a list of lines for the first match: 

>>> MessageFi 1e=open("msgl.txt") 

>>> msg=rfc822.Message(MessageFi 1 e) 

>>> msg.get("received") # The last value 

'from 216.20.160.186 by lw8fd.law8.hotmail.msn.com with 

HTTP; \011Thu, 28 Dec 2000 23:37:18 GMT’ 

>>> msg.getrawheaderC"RECEI VED") # the first value: 

' from hotmail.com [216.33.241.22] by mail3.oldmanmurray.com 
with ESMTP\012 (SMTPD32-6.05) id AB8884C01EE; Thu, 28 Dec 2000 
18:23:52 -0500\012' 

>>> msg.getal1matchingheaders("Received") # ALL values: 
['Received: from hotmail.com [216.33.241.22] by 
mail3.oldmanmurray.com with ESMTP\012', ' (SMTPD32-6.05) id 

AB8884C01EE; Thu, 28 Dec 2000 18:23:52 -0500\012', 'Received: 
from mail pickup Service by hotmail.com with Microsoft 
SMTPSVC;\012', '\011 Thu, 28 Dec 2000 15:37:19 -0800\012', 
'Received: from 216.20.160.186 by lw8fd.law8.hotmail.msn.com 
with HTTP;\011Thu, 28 Dec 2000 23:37:18 GMT\012'] 
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Some headers are dates. Call getdate(natne) to retrieve the value of header name 
as aTimeTuple. Alternatively, call getdate_tz(name ) to retrieve a 10-tuple; its first 
nine entries form a TimeTuple, and the tenth is the time zone’s offset (in ticks) from 
UTC. (Entries 6, 7, and 8 are unusable in each case.) For example: 

>>> msg.getdate("date") 


(2000, 

12, 28, 16, 

37, 

18, 

0, 0, 

0) 

>>> msg 

.getdate tz( 

"date") 



(2000, 

12, 28, 16, 

37, 

18, 

0, 0, 

0, -25200) 


The method getaddr( name ) helps parse To : and From: headers, returning their 
values in the form (full name, e-mail address). If the header name is not found, it 
returns (None, None). For example: 

>>> msg.getaddr("From") 

('Stephen Tanner', 'dumplechan@hotmail.com') 

>>> msg . getaddr (" Purpl eFlai rySpi ders") 

(None, None) 

Other members 

The method rewi ndbody seeks to the start of the message body (if the filelike 
object parsed supports seeking). 

A Message object supports the methods of a dictionary—for example, keys 
returns a list of headers found. The attribute f p is the original file parsed, and the 
attribute headers is a list of all header lines. 

If you need to subclass Message, you may want to override some of its parsing 
methods. The method i sl ast ( 1 i ne ) returns true if line marks the end of header 
lines. By default, i sl ast returns true when passed a blank line. The method 
i scomment (line) returns true if line is a comment that should be skipped. Finally, 
the method isheader(line) returns the header name if line is a valid header line, 
or None if it is not. 

Address lists 

The class AddressList holds a list of e-mail addresses. Its constructor takes a list 
of address strings; passing None results in an AddressLi st with no entries. 

You can take the length of an AddressList, add (merge) two AddressLists, remove 
(subtract) one of AddressLisfs elements from another AddressList, and retrieve a 
canonical string representation: 

>>> Listl = rfc822.AddressList(msg.getheader("To")) 

>>> Li st2=rfc822.AddressLi st(msg.getheader(" From")) 

>>> MergedList=Listl+List2 # Merge lists 
>>> 1 en(MergedList) # access list length 
2 
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>>> str(MergedList) # canonical representation 
’ dutnpl echan@seanbaby. cotn, "Stephen Tanner" 

<dutnpl echan@hottnai 1 .cotn>' 

>>> str (MergedLi st-Li stl) # remove one lisfs elements 
'"Stephen Tanner" <dumplechan@hotmai1.com>' 

An AddressList also provides the attribute addresslist,a list of tuples of the form 
(full name, e-mail address): 

>>> MergedList.addressl i st 

'dumplechan@seanbaby.com'), ('Stephen Tanner', 

' dumplechan@hotmai 1 .com')] 


rfc 822 utility functions 

The functions parsedata (str) and parsedata_tz (str) parse the string str, in 
the manner of the Message methods getdate and getdate_tz. The function 
mkti me_tz (tupl e) does the reverse — it converts a TimeTuple into a UTC 
timestamp. 


MIIVIE messages 

The class mi metool s . Message is a subclass of rfc822. Message. It provides some 
extra methods to help parse content-type and content-transfer-encoding headers. 

The method gettype returns the message type (in lowercase) from the content- 
type header, or text/pl ai n if no content-type header exists. The methods 
getmai ntype and getsubtype get the main type and subtype, respectively. 

The method getpl i st returns the parameters of the content-type header as a list 
of strings. For parameters of the form name=value, name is converted to lowercase 
but value is unchanged. 

The method getparam(name) gets the first value (from the content-type header) 
for a given name; any quotes or brackets surrounding the value are removed. 

The method getencoding returns the value of the content-transfer-encoding 
header, converted to lowercase. If not specified, it returns 7bit. 

This example scrutinizes some headers from an e-mail message: 

>>> MessageFi 1e=open("message.txt","r") 

>>> msg=mimetools.Message(MessageFi 1 e) 

>>> msg.gettype() 

' text/plain' 

>>> msg.getmaintype() 

' text' 

>>> msg.getsubtype!) 

' p 1 a i n ' 
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>>> msg.getplist() 

[ ’ f ortnat=f 1 owed ' ] 

>>> msg.get("content-type") 
'text/plain; format=f1owed' 
>>> msg.getparamC"format") 

' f1owed' 

>>> msg.getencoding() 

’ 7 b i t' 


Working with MIME Encoding 

Multipurpose Internet Mail Extensions (MIME) are a mechanism for tagging the doc- 
ument type of a message — or for several parts of one message. (See REC 1521 for a 
full description of MIME.) Several Python modules help handie MIME messages — 
most functions you need are there, though they may be spread across libraries. 

The module mimetool s provides functions to handie MIME encoding. The function 
decodef i nput.output,encoding) decodes from the filelike object input to output, 
using the specified encoding. The function encodef input,output,encoding) 
encodes. Legal values for encoding are base64, quoted-printable, and uuencode. 
These encodings use the modules base64,quopri, and uu, discussed in the section 
“Encoding and Decoding Message Data.” 

The function choose_boundary returns a unique string for use as a boundary 
between MIME message parts. 

Encoding and decoding MIIVIE messages 

The module mimify provides functions to encode and decode messages in MIME 
format. The function mimi fy (i nput, output ) encodes from the filelike object 
input into output. Non-ASCII characters are encoded using quoted-printable encod¬ 
ing, and MIME headers are added as necessary. The function unmi mi fy (i nput, 
outputC, decode_base64 ) decodes from input into output; if decode_base64 is 
true, then any portions of input encoded using base64 are also decoded. You can 
pass file names (instead of files) for input and output. 

The functions mime_encode_heacler( 1 i ne ) and mi me_clecocle_header (line) 

encode and decode a single string. 

The mimi fy module assumes that any line longer than mi mi fy . MAXLEN (by default, 
200) characters needs to be encoded. Also, the variable mimify. CHARSET is a 
default character set to fili in if not specified in the content-type header; it defaults 
to ISO-8859-1 (Latini). 
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Parsing multipart MIIVIE messages 

A MIME message can have several sections, each with a different content-type. The 
sections of a MIME message, in turn, can be divided into smaller subsections. The 
multifile module provides a class, Mul ti Fi 1 e, to wrap multi-part messages. A 
Mul ti Fi 1 e behaves like a file, and can treat section boundaries like an EOE. 

The constructor has syntax MultiFile(file[,seekable]). Here, file is a f ilelike 
object, and seekable shouid be set to false for nonseekable objects such as sockets. 

Call the method p u s h ( s t r ) to set sfr as the current boundary string; call p o p to 
remove the current boundary string from the stack. The Mu 11 i F i 1 e will raise an 
error if it encounters an invalid section boundary—for example, if you call 
push(X), and then push(Y), and the Mul t i Fi 1 e encounters the string X before 
seeing Y. A call to next jumps to the next occurrence of the current boundary 
string. The attribute 1 evel is the current nesting depth. 

The read, readl i ne, readl i nes, seek, and tel 1 methods of a Mul ti File operate 
on only the current section. Eor example, seek indices are relative to the start of 
the current section, and readl i nes returns only the lines in the current section. 

When you read to the end of a section, the attribute 1 as t is set to 1. At this point, it 
is not possible to read further, uniess you call nextorpop. 

The method i s_data (str ) returns false if s/r might be a section boundary. It is used 
as a fast test for section boundaries. The method secti on_di vi der(str ) converts 
sfr into a section-divider line, by prepending The method end_rriarker (str) 
converts str into an end-marker line, by adding at the beginning and end of str. 

Writing out multipart MIIVIE messages 

The module MitneWri ter provides the class Mi tneWri ter to help write multipart 
MIME messages. The constructor takes one argument, an open file (or filelike 
object) to write the message to. 

To add headers, call addheaderfheader, value[,prefix]). Here, header is the 
header to add, and value is its value. Set the parameter prefix to true to add the new 
header at the beginning of the message headers, or false (the default) to append it to 
the end. The method flushheaders writes out all accumulated headers; you shouid 
only call it for message parts with an empty body (which, in turn, shouldnT happen). 

To write a single-part message, call startbody(content[,plist[,prefix]])to 
construet a filelike object to hold the message body. Here, content is a value for the 
content-type header, and plist is a list of additional content-type parameter tuples of 
the form (name,value). The parameter prefix defaults to true, and functions as in 

addheader. 
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To write a multipart message, first call startmul ti partbody (subtype 
[,boundary[,plist[,prefix]]]). The content-type header has main type 
“multipart,” subtype subtype, and any extra parameters you pass in plist. For each 
part of the message, call nextpart to get a MimeWriter for that part. After finishing 
each part of the message, call lastpartto finish the message off. The call to 
startmul ti partbody also returns a filelike object; it can be used to store a 
message for non-MlME-capable Software. 

Note You shouid not cl ose the filelike objects provided by the MimeWriter, as each 
one is a wrapper for the same file. 

For example, Listing 17-2 writes out a multipart message and then parses it back 
again. 


Listing 17-2: MimeTest.py 


import MimeWriter 
import mimetools 
import base64 
import multitii e 

def TestWriting(): 

# Write out a multi-part MIME message. The first part is 
some plain text. The second part is an embedded 

# multi-part message; its two parts are an HTML document 

# and an i mage. 

MessageFi 1e=open("BigMessage.txt","w") 

msg=Mi meWriter.MimeWriter(MessageFi1 e) 

msg.addheader("From","dumplechan@hotmai 1 .com") 

msg.addheader("To","dave_brueck@hotmai1.com") 

msg.addheader("Subject","Pen-pal greetings (good times!)") 

# Generate a uni que section boundary: 
OuterBoundary=mimetools.choose_boundary() 

# Start the main message body. Write a brief message 

# for non-MIMEcapable readers: 

DummyFi 1 e=msg.startmultipartbody("mixed",OuterBoundary) 
DummyFi 1 e.write("If you can read this, your mai1readerXn") 
DummyFi 1 e.write("can't handle multi-part messages1\n") 

# Sub-part 1: Simple plaintext message 

submsg=msg.nextpart() 

FirstPartFi 1e=submsg.startbody("text/plain") 

FirstPartFi 1 e.write("Hei 1 0 ! \nThi s is a text part.Xn") 

FirstPartFi 1 e.write("It was a dark and stormy night...\n") 
FirstPartFile.writeC' * * TO BE CONTINUED * *\n") 

# Sub-part 2: Message with parallel html and image 
submsg2=msg.nextpart!) 

# Generate boundary for subparts: 

InnerBoundary=mimetools.choose_boundary() 
submsgB.startmultipartbody!"mixed",InnerBoundary) 
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subtnsg2partl = subtnsg2. nextpart () 

# Sub-part 2.1: HTML page 

SubTextFi 1 e=subtnsg2parti. startbody ("text/html") 

SubText Fi 1 e . wri te (" <htrril Xti tl e>Hel 1 0 ! </ti tl e>\n") 
SubTextFi1 e.write("<body>Hel1o world!</body></html>\n") 

# Sub-part 2.2: Picture, encoded with base64 encoding 

submsg2part2=submsg2.nextpart() 

subtnsg2part2. addheaderCContent-Transfer -Encoding", 
"base64") 

Image Fi 1e=submsg2part2.startbody("image/gif") 
SourceImage=open("pic.gif","rb") 
base64.encode(Sourcelmage,ImageFi 1 e) 

# Finish off the sub-message and the main message: 

submsg2.1astpartC) 
msg.1astpartC) 

MessageFi1 e.close() # all done! 
def TestReading(): 

MessageFi1e=open("BigMessage.txt","r") 

# Parse the message boundary using mimetools: 
msg=mimetools.Message(MessageFi1 e) 
OuterBoundary=msg.getparam("boundary") 
reader=multifi 1 e.MultiFi 1e(MessageFi1 e) 
reader.push(OuterBoundary) 

print "**Text for non-MIME-capabl e readers:" 

print reader.read() 

reader.next() 

print "**Text message:" 

print reader.read() 

reader.next() 

# Parse the inner boundary: 

msg=mimetools.Message(reader) 
InnerBoundary=msg.getparam("boundary") 
reader.seek(0) # rewind! 
reader.push(InnerBoundary) 
reader.next() # seek to part 2.1 
print "**HTML page:" 
print reader.read() 
reader.next() 

print "**Writing image to pic2.gif..." 

# seek to start of (encoded) body: 
msg=mimetools.Message(reader) 

msg . rewindbody() 

# decode the image: 

ImageFi1e=open("pic2.gi f","wb") 
base64.decode(reader,ImageFile) 

if (_name_=="_main_"): 

TestWriting() 

TestReading() 
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Handiing document types 

There is no official mapping between MIME types and file extensions. However, the 
module mimetypes can make reasonable guesses. The function guess_extensi on 
(ty pe) returns a reasonable extension for files of content-type type, or None if it 
has no idea. 

The function guess_type( fi 1 ename) returns a tuple of the form (type, encoding). 
Here, type is a content-type that is probably valid, based on the file’s extension. If 
guess_type doesn’t have a good guess for type, it returns None. The value encod¬ 
ing is the name of the encoding program used on the file, or None: 

>>> mimetypes.guess_extension("text/plain") 

'.txt' 

>>> mimetypes.guess_type( "fred.txt" ) 

('text/plain ' , None) 

>>> mimetypes.guess_type("Spam.mp3") 

(None, None) 

You can customize the mapping between extensions and types. Many Systems store 
files named mime.types to hold this mapping; the mimetool s module keeps a list of 
common UNIX paths to such files in knownf i 1 es. The function read_mime_types 
(fi 1 ename) reads mappings from the specified file. Each Une of the file should 
include a mime-type and then one or more extensions, separated by whitespace. 
Listing 17-3 shows a sample mime. types file: 


Listing 17-3: sample mime.types file 

plain/text txt 

appli cation/mp3 mp3 mp2 


The function i ni t( [f i 1 es] ) reads mappings from the files in the list files, which 
defaults to knownf i les. Files later in the list override earlier files in the case of a 
conflict. The module variable i n i ted is true if init has been called; calling init 
multiple times is allowed. Tbe following shows an easy way to customize the 
mapping: 

>>> MyPath="c:\\python20\\mime.types" # (customize this) 

>>> mimetools.i nit([MyPath]) # old settings may be overridden 

You can also directly access the mapping from extensions to encodings 
(encodi ngs_map), and the mapping from extensions to MIME-types (types_map). 
The mapping suffi x_map is used to map the extensions .tgz, .taz, and .tzto 
. tar. gz. 
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Parsing mailcap files 

A mailcap (for “mail capability”) file maps document MlME-types to commands 
appropriate for each type of document. Mailcap files are commonly used on UNIX 
Systems. (On Windows, file associations are normally stored in the reglstry.) 

See RFC 1524 for a definition of the file format. 


The module mailcap provides functions to help retrieve Information from mailcap 
files. The function getcaps returns a dictionary of mailcap Information. You use it 
by passing ittofindtnatch(caps,MIMEType[, key [ ,filenatne[,plist]]]). Here, 
caps is the dictionary returned by getcaps, and MIMEType is the type of document 
to access. The parameter key is the type of access (such as view, compose, or edit); 
it defaults to view. The return value of f i ndmatch is the command line to execute 
(through os . System, for example). You can pass a list of extra parameters in plist. 
Each entry should take the form name=value — for example, colors=256. 

The function getcaps parses /etc/mailcap, /usr/etc/mailcap, /usr/local/etc/mailcap, 
and $HOME/mailcap. The user mailcap file, if any, overrides the System mailcap 
settings. 


Encoding and Decoding Message Data 

E-mail messages must pass through varlous Systems on their way from one person 
to another. Different computers handle data in different (sometimes incompatible) 
ways. Therefore, most e-mail programs encode binary data as 7-bit ASCII text. The 
encoded file is larger than the origlnal, but is less likely to be mangled in transit. 
Python provides modules to help use three such encoding schemes — uuencode, 
base64, and quoted-printable. 

Uuencode 

The module uu provides functions to encode (binary-to-ASCll) and decode 
(ASCll-to-binary) binary files using uuencoding. The function encodednput, 
output[,name[,mode]]) uuencodes the file input, writing the resulting output to 
the file output. If passed, name and mode are put into the file header as the file name 
and permissions. 

The function decodef input,output) decodes from the file input to the file output. 

For example, the following lines encode a Flash animation 
fi 1 e.>>> source=open("pample2.swf","rb") 

>>> destination=open("painple2.uu","w") 

>>> uu.encode(source,destination) 
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In this case, the file must be opened in binary mode ("rb") under Windows or 
Macintosh; this is not necessary on UNIX. 


These lines decode the file, and then launch it in a browser window: 


>>> source=open("patnple2.uu","r") 

>>> desti nati on=open("patnpl e. swf", "wb") 

>>> uu.decode(source,destination) 

>>> destination.close() 

>>> Webbrowser.openCpatnple.swf") 

/Note It is possible to pass file names (instead of open files) to encode or decode. 

^ However, this usage is deprecated. 


Base64 

Base64 is another algorithm for encoding binary data as ASCII. The module base64 
provides functions for working with MIME base64 encoding. 

The function encodestring(data) encodes a string of binary data, data, and 
returns a string of base64-encoded data. The function encode(input, output) 
reads data from the filelike object input, and writes an encoded base64 string to the 
filelike object output. 

To decode a base64 string, call decodestring(data).To decode from one filelike 
object to another, call decodeC i nput, output). 

Base64 is sometimes used to hide data from prying eyes. It is no substitute for 
encryption, but is better than nothing. The code in Listing 17-4 uses base64 to hide 
the files from one directory in another directory: 


Listing 17-4: Conceal.py 


import base64 
import string 
import os 

. Hide files by base64-encoding them. Use Conceal to hide 

files, and Reveal to un-hide them. . 

# not ok for filenames: 

Evi 1Chars = "/\n" 

# not Base64 characters, ok for filenames: 

GoodChars="_ " 

TransiateEvi1 = string.maketrans(Evi 1Chars,GoodChars) 
UnTranslateEvi1 = string.maketrans(GoodChars,Evi 1Chars) 
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def GetEncodedNatne(01 dNatne): 

MagicNatne = base64.encodestring(01dNatne) 

MagicNatne = string. transi ate(Magi cNatne,Transi ateEvi 1 ) 
return MagicName 

def GetDecodedName(01 dNatne): 

MagicNatne = stri ng. transi ate( 01 dNatne, UnTransl ateEvi 1 ) 
MagicNatne = base64.decodestring(01dNatne) 
return MagicNatne 

def Conceal(SourceDir,DestDir): 

. Encode the files in sourcedir as files in destdir . 

for FileNatne in os . 1 i stdi r (SourceDi r): 

FilePath = os . path . joi n (SourceDi r , Fi 1 eNatne) 

# Note: need "rb" here! (on UNIX, just "r" is ok) 

InFi 1 e=open(Fi 1ePath,"rb") 
OutputFilePath=os.path.join( 

DestDi r, GetEncodedName ( Fi 1 eNatne)) 

OutFi le=open(OutputFilePath,"w") 
base64.encode(InFile,0utFile) 

InFi1 e.close() 

OutFi1 e.close() 

def Reveal(SourceDir,DestDir): 

. Decode the files in sourcedir into destdir . 

for Fi 1 eNatne in os . 1 i stdi r( SourceDi r): 

FilePath = os . path . joi n (SourceDi r , Fi 1 eNatne) 

I nFi1e=open(Fi 1ePath,"r") 

OutputFilePath=os.path.join(DestDir, GetDecodedNatne( Fi 1 eNatne)) 
OutFile=open(OutputFi1ePath,"wb") 
base64.decode(InFi1 e,OutFi 1 e) 

InFi1 e.close() 

OutFi1 e.close() 


Quoted-printable 

Quoted-printable encoding is another scheme for encoding binary data as ASCII 
text. It Works best for strings with relatively few non-ASCII characters (such as 
German text, witb occasional umlauts); for binary files such as images, base64 is 
more appropriate. 

The module quopri provides functlons to handle quoted-printable encoding. The 
function decodef i nput, output ) decodes from the filelike object inputto the file- 
like object output. The function encodeC i nput, output, quotetabs ) encodes from 
input to output. The parameter quotetabs indicates whether tabs should be quoted. 
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Working with UNIX Mailboxes 

Many UNIX mail programs store all e-mail in one file or directory called a mailbox. 
The module mailbox provides utility classes for parslng such a mailbox. Each class 
provides asingle method, next, which returns the next rfc822 .Message object. 
Mailbox parser constructors each take either a file object or directory name as 
their only argument. Table 17-2 lists the available mailbox parser classes. 


Table 17-2 

Mailbox Parsers 

Class 

Mailbox Type 

UnixMailbox 

Classic UNlX-style mailbox, as used by elm or pine 

MmdfMailbox 

MMDF mailbox 

MHMailbox 

MH mailbox (directory) 

Maildir 

QmaiI mailbox (directory) 

BabylMailbox 

BabyI mailbox 


Working with IVIH mailboxes 

The module mhl i b provides advanced features for managing MH mailboxes. It 
includes three classes: MH represents a collection of mail folders, Fol der represents 
a single mail folder, and Message represents a single message. 


MH objects 

The constructor has the syntax MH([path[,profile]]). You can pass path and/or 
profile to override the default mailbox directory and profile. 

The method openfol der (name) returns a Fol der object for the folder name. The 
method setcontext(name) sets the current folder to name; getcontext retrieves 
the current folder (Inltially “inbox”). 

The method 1 i stfol ders returns a sorted llst of top-level folder names; 
listallfolders returns a llst of all folder names. listsubfolders(name) returns 
a llst of Immediate child folders of the folder name; 1 i stal 1 subfol ders (name) 
returns a llst of all subfolders of the folder name. 

The methods makefolderCname) and dei etefol der (name) create and destroy a 
folder with the given name. 
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The method getpath returns the path to the mailbox. The method 
getprof i 1 e( key ) returns the profile entry for key (or None, if none is set). And 
the method error (format, a rguments) prints the error message (format % 
arguments ) to stderr. 


Folder objects 

The methods getcurrent and setcurrent (i ndex ) are accessors for the current 
message number. getl ast returns the index of the last message (or 0 if there are no 
messages). 1 i stmessages returns a list of message indices. 

The method getsequences returns a dictionary of sequences, where each key is a 
sequence name and the corresponding value is a list of the sequence’s message 
numbers. putsequences(dict) writes such a dictionary of sequences back to the 
sequence files. The method parsesequence(str) parses the stringsfr Into a list of 
message numbers. 

You can delete messages with removemessages (list), or move them to a new 
folder with ref i 1 emessages (list, newfol der ). Here, listis a list of message 
numbers on which to operate. You can move one message by calling 
movemessage( i ndex, newfolder,newindex),or copy one message by calling 
copymessage( i ndex,newfolder,newindex). Here, newindex is the desired 
message number in the new folder newfolder. 

The path to the folder is accessible through getf ul 1 name, while 
getsequencesf i 1 ename returns the path to the sequences file, and 
getmessagef i 1 ename (i ndex ) returns the full path to message index. The 
method error (format, a rguments) prints the error message (format % 
arguments) to stderr. 

Message objects 

The class mh . Message is a subclass of mi metool s . Message. It provides one extra 
method, openmessage (i ndex ), which returns a new Message object for message 
number index. 


Using Web Cookies 

A cookie is a token used to manage sessions on the World Wide Web. Web servers 
send cookie values to a browser; the browser then regurgitates cookie values when 
it sends a Web request. The module Cookie provides classes to handle cookies. It is 
especially useful for maklng a robot, as many Web sites require cookies to function 
properly. 
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Cookies 

The class Si tnpl eCooki e is a dictionary mapping cookie names to cookie values. 
Each cookie value is stored asaCookie.Morsel. You can pass a cookie string (as 
received from the Web server) to Simpl eCooki e’s constructor, or to its 1 oad 
method. 

To retrieve cookie values in a format suitable for inclusion in an HTTP request, call 
the method output([attr i butes[,header[,separator]]]). To retrieve only 
some cookie attributes, pass a list of desired attributes in attributes. The parameter 
headeris the header to use (by default, “Set-Cookie:”). Finally, separatoris the 
separator to place between cookies (by default, a newline). 

For example, the following lines capture cookies as returned from a Web request: 

>>> Request=httpl i b . HTTP( "www .rrip3 . com") 

>>> Request.putrequest("GET",URLString) 

>>> Request.endheaders() 

>>> Response=Request.getreply() 

>>> # Response[2] is the header dictionary 
>>> CookieString=Response[2]["set-cookie"] 

>>> print CookieString 

LANG=eng; path = /; domai n= .rrip3 . com 

>>> CookieJar=Cookie.SimpleCookie() 

>>> CookieJar.load(CookieString) 

>>> print Cookiedar.outputf) 

'Set-Cookie: LANG=eng; Path=/; Domain=.mp3.com;' 

>>> print Cooki edar.outputf["domain"]) 

'Set-Cookie: LANG=eng; Domain=.mp3.com;' 

The method j s_output( [attri butes] ) also outputs cookies, this time in the 
form of a JavaScript snippet to set their values. 


Morsels 

A morsel Stores a cookie name in the attribute key, its value in the attribute val ue, 
and its coded value (suitable for sending) in the attribute coded_val ue. The conve- 
nience function set (key , value, coded_value) sets all three attributes. 

Morsels provide o ut put and j s_output methods mirroring those of their owning 
cookie; they also provide an OutputStr i ng([ attri butes]) method that returns 
the morsel as a human-readable string. 

A morsel also functions as a dictionary, whose keys are cookie attributes (expires, 
path, comment, domain, max-age, secure, and version). The method 
i sReservedKey (key ) tests whether key is one of the reserved cookie attributes. 
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When sending cookies in an HTTP request, you shouid oniy send cookies whose 
domain is a substring of the host's name. Otherwise, you might confuse the host. 
Or, you may send it information it shouldn't know about, such as passwords for an 
unrelated site. Moreover, be aware that the Cookie class onIy handies one value 
for a given name; setting a new value for that name overwrites the old one. 


Example: a cookie importer 

The code in Listing 17-5 provides functions to import cookies from Internet 
Explorer 5.0 or Netscape. 


Listing 17-5: CookieMonster.py 


import Cookie 
import os 

def AddMorsel(Cookiedar,CookieName,CookieValue,HostString): 

# Cookie set expects a string. so CookieJar[''name"] = "val ue" 

# is ok, but CookieJar["name"]=Morsel is not ok. 

But, cookie get returns a Morsel : 

CookieJarfCookieName]=Cooki eVal ue 

CookieJarfCookieName]["doma i n"]=HostStri ng 

def ParseNetscapeCookies(filename): 

# Netscape Stores cookies in one tabdelimited file, 

# starting on the fourth line 

CookieFi1e=open(fi 1ename) 
CookieLines=CookieFile.readlines()[4:] 

CookieFi1 e.close() 

CookieJar=Cookie.SimpleCookie() 
for Cookieline in Cookielines: 

CookieParts = Cookieline.strip().split('\t') 

AddMorsel(CookieJar,CookieParts[-2], 

CookieParts[-l],CookiePartsfO]) 
return CookieJar 

def ParseIECookies(dir): 

CookieJar=Cookie.SimpleCookie() 
for FileName in os.1 istdir(dir): 

# Skip non-cookie files: 

if 1 en(Fi 1eName)<3 or Fi 1eName[-3:].upper()! = "TXT": 
conti nue 

CookieFile=open(os.path.join(dir,FileName)) 

Cookielines=CookieFi 1 e.readl i nes () 

CookieFi1 e.close() 
linelndex=0 


Continued 
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Listing 17-5 (continued) 


whi 1 e (LineIndex+2)<1en(CookieLines ): 

# : ! removes trailing newli ne 
Cooki eName=CookieLines[Line Index][:-1] 

Cooki eVal ue=Cooki eLines[Linelndex+1][:-1] 
HostString=CookieLines[LineIndex+2][:-1] 
AddMorsel(Cookiedar,CookieName, 
CookieValue,HostString) 

Linelndex+=9 
return CookieJar 

def 0utputForHosKCookieJar,Host,attr=None, 
header="Set-Cookie:",sep="\ n"): 

# Return only cookie values matching the specified host. 

CookieHeader="" 

for OneMorsel in CookieJar.values(): 

MorselHost=OneMorsel.get("doma in",None) 
if (MorselHost==None or Host.find(MorselHost)! = -l): 
CookieHeader+=OneMorsel.output(attr,header)+sep 
return CookieHeader 

if (_name_=="_main_"): 

Cookies=ParseIECookies( 

"C: WDocuments and SettingsWAdmi ni stratorWCookiesW") 
print OutputForHost(Cookies,"www .thestreet.com/") 


Summary 

Python’s Standard libraries help with many common tasks in Internet programming. 
In this chapter, you: 

Parsed robots . txt to create a well-behaved robo-browser. 

♦ Handied various e-maii headers. 

♦ Imported cookies from a browser cache. 

In the next chapter, you learn simple, powerfui ways to make your Python programs 
parse HTML and XML. 






Parsing XML 
and Other 
Markup 
Languages 


M arkup languages are a powerful way to store text, 

complete with formatting and metadata. HTML is the 
format for about half a billion pages on the World Wide Web. 
Extensible Markup Language (XML) promises to facilitate data 
exchange of all types. 

Python includes Standard libraries to parse HTML and XML. 
This chapter shows you how to use these libraries to create a 
Web robot, a data importer/exporter, and more. 
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Markup Language Basies 

HyperText Markup Language, or HTML, is used for nearly all 
the pages on the World Wide Web. It defines tags to control 
the formatting of text, graphies, and so forth, by a browser. 

Extensible Markup Language, or XML, is a tool for data 
exchange. It includes metadata tags to explain what text items 
mean. For instance, a person (or program) reading the 
number “120/80” might not know that it represents a blood 
pressure, but XML can include tags to make this ciear: 
<blood-pressure>120/80</blood-pressure) 


Standard generat markup language, or SGML, is very general 
and rarely used. 
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Tags are for metatext 

Markup languages are a way to store text together with tags. Tags are metatext that 
govern the texfs formatting or describe its meaning. Tags are enclosed in brackets 
<like this>. An opening tag has a corresponding closing tag, which includes a back- 
slash </like this>. The text between (inside) the tags is the text they describe or 
modify. For example, the following HTML fragment formats a sentence: 

Presentation tags can set <b>bold</b> type or <i>italics</i> 

Tags may have attributes to refine their meanings. For example, in HTML, the f ont 
tag sets the font, and the color attribute specifies the desired font color: 

<F0NT COLOR=#FFFFFF>white text</F0NT> 

In XML, the Information contained between a start tag and its end tag is called an 
element. Elements store data, and may contain sub-elements. Start and end tags 
may be collapsed into a single tag for the element: 

<blood type="A" color="red" /> 

XML data can be stored in the element attributes, or in text. For example, these 
lines are both reasonable ways to store a person’s name: 

<Person narrie="Bob Flope" /> 

<Person>Bob Flope</Person> 

Tag rules 

In XML, each start tag must have a corresponding end tag. This is a good idea in 
HTML as well. Many HTML documents do not close all their tags; however, the 
World Wlde Web Consortium (W3C) has proposed a new Standard, XHTML, that 
requires an end tag for each start tag. 

Tags may be nested within other tags. It is best to close a child tag before closing 
its parent tag. This is mandatory in XML. It is recommended in HTML, as bad test- 
ing may make a Web page render badly: 

<b>rtTi not dead <i>yet</b></i > Bad! 

<b>rrri not dead <i >yet</i ></b> Good! 

The available tags in HTML are described in the HTML Standard. The available tags 
in XML vary from file to file — because XML is Extensible Markup Language, one 
extends it by adding new tags. ADocument Type Descriptor, or DTD, lists available 
tags for an XML document. A DTD also includes rules for tag placement — which 
tags are parents of other tags, and so on. 
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Namespaces 

XML files can organize tag and attribute names into namespaces. A name within a 
namespace takes the form NamespacePref ix: Name. For example, this tag’s local 
name is Name, and its namespace prefix is Patient: 

<Patient:Name>Alfred</Patient:Name> 

A namespace prefix maps to a particular URI, which is often the URL of a Web page 
explaining the namespace. In general, when parsing XML, you can ignore names¬ 
paces. But, they are a handy tool for designing a good XML DTD. 

Processing XML 

There are two main ways of Processing XML. You can parse the entire document 
into memory, and navigate the tree of tags and attributes at your leisure. The 
Document Object ModeI (DOM) API is an interface for such a parser. Or, you can 
perform event-driven parsing, handiing each tag as you read it from the file. The 
Simple API for XML (SAX) is an interface for such a parser. (The module xml 1 i b is 
also an event-driven parser.) 

Of the two interfaces, I find DOM to be the easiest. Also, DOM can change an XML 
file without doing direct string manipulation, which gives it big points in my book. 
One disadvantage of DOM is that it must read the entire XML file into memory 
upfront, so SAX may be a better choice if you must parse mammoth XML files. Both 
interfaces are very rich, offering more features than you are likely to need or want; 
this chapter covers oniy the core of the two parsing APIs. 

In order to process XML with Python, you will need a third-party XML parser. The 
Python distribution for Windows currently includes the Expat non-validating parser. 
But on UNIX, you will need to build the Expat library, and make sure that the pyex- 
pat module is buiit as well. 


Parsing HTML Files 

The module html 1 i b defines the HTMLParser class. You create a subclass of 
HTMLParser to build your own HTML parser. The HTMLParser class is itself a 
subclass ofsgmllib.SGMLParser, but you will probably never use the superclass 
directiy. 

The HTMLParser constructor takes a formatter, as defined in the formatter mod¬ 
ule. (See Chapter 17 for information about formatter .) The formatter is used to 
output the text in the HTML stream. The member formatter is a reference to the 
parser’s formatter. If you don’t need to use a formatter, you can use a null formatter, 
as the following subclass does: 
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cl ass Sitnpl eHTMLParser( html 1 i b. HTMLParser): 

def _i ni t_(sel f): 

# initialize the superclass 

html1 ib.HTMLParser._i nit_(self, 

formatter.NullFormatter()) 

# ... override other methods here ... 

HTMLParser methods 

Call the method f eed (text) to send the HTML string text into the parser. You can 
feed the parser an entire file at one time, or one piece at a time; its hehavior is the 
same. The res et method causes the parser to forget everything it was doing and 
start over. The cl ose method finishes off the current file; it has the same effect as 
feeding an end-of-file marker to the parser. If you override cl ose, your subclass’s 
cl ose method should call the cl ose method of the superclass. 

The method get_starttag_text returns the text of the most recently opened tag. 
The method setnomoretags telis the parser to stop Processing tags. Similarly, the 
method setl i tera 1 telis the parser to treat the following text literally (ignoring tags). 

Handiing tags 

To handle a particular tag, define start_xxx and end_xxx methods in your class, 
where xxx is the tag (in lowercase). A start_xxx method takes one parameter — a 
list of name-value pairs corresponding to the HTML tag’s arguments. An end_xxx 
method takes no arguments. 

You can also handle a tag with a method of the form do_xxx( arguments ). The do 
method is called only if start and end methods are not defined. 

For example, the following method prints the name of any background image for 
the page, as defined in a <B0DY> tag: 


def do_body(sel f,args): 
for ValTuple in args: 

# convert argname to uppercase 
if string.upper(ValTuple[0])=="BACKGROUND": 
print "Page background imageValTuple[l] 

Other parsing methods 

The method handle_data(data) is called to handle Standard text that is not part 
of a tag. Note that handl e_data may be called one or several times for one contigu- 
ous “block” of data. 

The method anchor_bgn (href, name, type) is called for the start of an anchor 
tag, <a>. The method anchor_end is called at the end of an anchor. By default, 
these methods bulld up a list of links in the member anchorlist. 
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The method handl e_i tnage(source,alt[,istnap[,align[,width[,height]]]]) 

is called when an image is encountered. The default implementation simply hands 
the string ait over to handle_data. 

The method save_bgn starts storing data, instead of sending it to the formatter via 
handl e_data. The method save_end returns all the data buffered since the call to 
save_bgn. These calls may not be nested, and sa ve_end may not be called before 
save_bgn. 

If a tag bandler (of the form start_xxx or do_xxx) is defined for a tag, the method 
handl e_starttag (tag , method, arguments) is called. The parameter tag is the 
tag name (in lowercase), and method is the stant ordo method for the tag. By 
default, handl e_starttag calls method, passing arguments. 

Simllarly, the method handl e_endtag( tag, method) is called for a tag if you have 
defined an end method for that tag. 

The method handle_charref(ref) processes character references of the form 
&#ref. By default, re/'is interpreted as an ASCII character value from 0 to 255, and 
handed over to handl e_data. 

The method handl e_e ntityref(ref) processes entity references of the form 
&ref. By default, it looks at the attribute enti tydef s, wbich should be a dictionary 
mapping from entity names to meanings. Tbe variable html enti tydef s . 
enti tydef s defines tbe default entity definitions for HTMLParser. For example, 
the codes &amp, &apos, &gt, &11, and &quot translate into the characters & ‘ > < “. 

The method handl e_comment (commenttext) is called when a comment of the 
form < ! -commenttext-> is encountered. 

The attribute nof i 11 is a flag governing the handling of whitespace. Normally, nofill 
is false, which causes whitespace to be collapsed. It affects the behavior of han - 

dl e_data and save_end. 

Handling unknown or bogus elements 

The HTMLParser defines methods to handle unknown HTML elements. By default, 
these methods do nothing; you may want to override them (to report an error, for 
example). 

The method unknown_starttag (tag , attributeslis called when a tag with no 
stant method is encountered. (For a given tag, either handl e_starttag or 
unknown_starttag is called.) The method unknown_endtag(tag) is called for 
unknown end tags. The methods unknown_charref (ref) and unknown_enti - 
ty ref (ref) handle unknown character and entity references, respectlvely. 

The method report_unbalanced(tag) is called if the parser encounters a closing 
tag tag with no corresponding openlng tag. 
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Example: Bold Oniy 

Listing 18-1 illustrates a simple subclass of HTMLParser that filters out only bold 
text from an HTML stream. Listing 18-2 shows sample output from the parser. 


Listing 18-1: BoIdOnly.py 


import htmllib 
import formatter 

TEST_HMTL_STRING=.<httnl > 

<title>A poetn</ti tl e> 

There once was a <b>poet natned Dan</b><br> 

Who could not make <b>l itnericks</b> scan<br> 

He'd be doing just fine<br> 

Till the <b>very last line</b> 

Then he'd squeeze in <b>too many syl1 abies</b> 
and it wouldn't even rhytTie<br> 

</httnl >. 

class PrintBoldOnlyChtmllib.HTMLParser): 
def _init_(self): 

# AbstractFormatter hands off text to the writer. 

htmllib.HTMLParser._i nit_(self, 

formatter.AbstractFormatter(formatter.DumbWriter())) 
self.Printing=0 # don't print until we see bold 

# Note: The bold tag <b> takes no attributes, so the 

# attributes parameter for start_b will always be an 

# empty list) 

def start_b(self,attributes ): 

self.Printing=l 
def end_b(self): 

self.Printing=0 
def handle_data(self,text): 
i f (s e 1 f. P r i n t i n g): 

# Call superclass method, pass text to formatter: 

htmllib.HTMLParser.handle_data(self,text) 

if (_name_=="_main_"): 

Test=PrintBoldOnly() 

Test.feed(TEST_HMTL_STRING) 

Test.close() 
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Listing 18-2: BoldOnly output 


poet named Dan 
1 i mericks 

very last line too many syllables 


Example: Web Robot 

A robot is a program that browses the World Wide Web automatically. Listing 18-3 is 
a simple robot. It follows links between pages, and saves pages to the local disk. It 
overrides several metbods of the HTMLParser in order to follow various links. 


Listing 18-3: Robotpy 


import htmllib 
import formatter 
import urlparse 
import re 
import os 
import string 
import urllib 

# Redefine this to a directory where you want to put files 

R00T_DIR = "c:\\python20\\robotfi 1 esW" 

# Web page file extensions that usually return HTML 
HTML_EXTENSION_DICT={"":1,"HTM":1,"HTML":1,"PHTML":1,"SHTML":1, 
"PHP":l,"PHP3":l,"HTS":l,"ASP":l,"PL":l,"JSP":l,"CGr':l) 

# Use this string to limit the robot to one site —only URLs 

# that contain this string wi11 be retrieved. If this is null, 

# the robot will attempt to pul1 down the whole WWW. 

REQUIRED_URL_STRING="kibo.com" 

# Compile a regular expression for case insensitive matching of 

# the required string 

Requi redUrl RE = re.compi1e(re.escapeCREQUIRED_URL_STRING), 

re.IGNORECASE) 

# Keep track of all the pages we have visited in a dictionary, 

# so that we don't hit the same page repeatedly. 

VisitedURLs={1 


# Queue of target URLs 

TargetURLLi st=["http://www.kibo.com/index.html"] 


Continued 
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Listing 18-3 (continued) 


def AddURLToList(NewURL): 

# Skip duplicate URLs 

if (Vi sitedURLs.has_key(NewURL)): return 

# Skip URLs that don't contain the proper substring 
if (not RequiredUrlRE.search(NewURL)): return 

# Add URL to the target list 
TargetURLList.appendiNewURL) 

# Chop fi 1e extension from the end of a URL 

def GetExtensionFrotnString(FileString): 
DotChunks=string.split(FileString,".") 
if 1 en(DotChunks)==1: return "" 

LastBlock=DotChunks[-l] # Take stuff after the last . 
i f string.find(LastBlock,"/")! = -l: 
return "" 

if string.find(LastBlock,"\\")!=-l: 
return "" 

return string.upper(LastBlock) 

cl ass HTMLRobot(html1 ib.HTMLParser): 
def StartNewPageiself,BaseURL): 

self.BaseURL=BaseLIRL 
def _i ni t_(sel f): 

# Initialize the master class 

html 1 ib.HTMLParser._i nit_( 

self,formatter.NullFormatteri)) 
def do_bodyiself,args ): 

# Retrieve background image, if any 
for ValTuple in args: 

if string.upperiValTuple[0])=="BACKGROUND": 
ImageURL = uriparse.urijoin i 
self.BaseURL, ValTuple[l]) 

AddURLToListiImageURL) 
def do_embediself,args): 

# Handle embedded content 
for ValTuple in args: 

if string.upperiValTuple[0])=="SRC": 
sel f.HandleAnchoriValTuple[1]) 
def do_areaiself,args ): 

# Handle areas inside an imagemap 
for ValTuple in args: 

if string.upperiValTuple[0])=="HREF": 
self.HandleAnchoriValTuple[1]) 
def handle_imageiself, source, ait, ismap, 
align, width, height): 

# Retrieve images 

ImageURL = uriparse.urijoiniself.BaseURL, source) 
AddURLToListiImageURL) 
def anchor_bgniself.TempURL,name,type): 

# Anchors (links). Skip mailto links. 
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if TempURLCO:7].upper() == "MAILTO:": return 
NewURL=urlparse.urljoin(self. BaseURL, TetnpURL) 
AddURLToList(NewURL) 
def do_frarrie( sel f, args): 

# Handle a subframe as a link 
for ValTuple in args: 

if string.upper(ValTuple[0])=="SRC": 
self.anchor_bgn(ValTuple[1]) 
def do_option(sel f,args): 
for ValTuple in args: 

if string.upper(ValTuple[0])=="VALUE": 

# This might be a Webpage... 

TheExtension = \ 

GetExtensionFrotnString(ValTuple[l]) 
if HTML_EXTENSION_DICT. has_l<ey( TheExtension ): 
self.anchor_bgn(ValTupl e[1]) 

if (_natne_=="_main_"): 

Parser = HTMLRobot() 

while (1 en(TargetURLList)>0): 

# Take the next URL off the list 
NextURL = TargetURLList[0] 

dei TargetURLList[0] 

Vi sitedURLs[NextURL] = l # flag as visited 
print "RetrievingNextURL 

# Parse the URL, and decide whether 

# we think it's HTML or not: 
URLTuple=urlparse.urlparse(NextURL,"http",0) 
TheExtension=GetExtensionFromString(URLTuple[2]) 

# Get a local filename; make di rectori es as needed 
TargetPath = os.path.nortnpath(R00T_DIR+URLT uple[2]) 

# If no extension, assume it's a directory and 

# retrieve index.html. 
if (TheExtension==""): 

TargetDir=TargetPath 

TargetPath=os.path.normpath( 

TargetPath+"/index.html") 

el se: 

(TargetDir,Target File) = os.path.split(TargetPath) 

try: 

os.makedirs(TargetDir) 
except: 

pass # Ignore exception if directory exists 
if HTML_EXTENSION_DICT.has_key(TheExtension): 

# This is HTML - retrieve it to disk and then 

# feed it to the parser 

URL Fi 1e=urllib.urlopen(NextURL) 

HTMLText = URLFi1 e.read() 

URLFi1 e.close() 

HTMLFile=open(TargetPath,"w") 
HTMLFile.write(HTMLText) 


Continued 


334 Partili -f Networking and the Internet 


Listing 18-3 (continued) 


HTMLFi1 e.close() 

Parser.StartNewPage(NextURL) 

Parser.feed(HTMLText) 

Parser. close() 
el se: 

# This isn't HTML - save to disk 

urllib.urlretrieve(NextURL,TargetPath) 


Parsing XML with SAX 

SAX is a Standard interface for event-driven XML parsing. Parsers that implement 
SAX are available in Java, C++, and (of course) Python. The module xtnl . sax is the 
overseer of SAX parsers. 

The method xrril.sax.parse(xrrilfile,contenthandler[,errorhandler]) 

creates a SAX parser and parses the specified XML. The parameter xmlfile can he 
either a file or the name of a file to read from. The parameter contenthandler must 
he a ContentHandler ohject. If specified, errorhandler mnst he a SAX ErrorHandler 
ohject. If no error handler is provided and an error occurs, the parser will 
raise a SAXParseExcepti on if it encounters errors. Simllarly, the method 
parseString(xrrilstring,contenthandler[,errorhandler]) parses XML 
from the supplied string xmlstring. 

Parsing XML with SAX generally requires you to create your own ContentHandler, 
hy suhclassing xtnl .sax.ContentHandler. Your ContentHandler handles the par- 
ticular tags and attrihutes of your flavor(s) of XML. 

Using a ContentHandIer 

A ContentHandl er object provides methods to handle various parsing events. Its 
owning parser calls ContentHandIer methods as it parses the XML file. The method 
setDocument Locator (locator) is normally called first. The methods 
startDocument and endDocutnent are called at the start and the end of the XML 
file. The method characters(text) is passed character data of the XML file via 
the parameter text. 

The ContentHandIer is called at the start and end of each element. If the parser is 
notin namespace mode, the methods startEl ement (tag, attributes) and 
endEl ement (tag) are called; otherwise, the corresponding methods 
StartEl ementNS and endEl ementNS are called. Here, fagis the element tag, and 
attributes is an Attributes object. 
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The methods startPrefixMappingCprefix,URI ) and endPrefi xMappi ngCpre- 

f i X ) are called for each namespace mapping; normally, namespace Processing is 
handled by the XMLReader itself. For a given prefix, endPrefi xMethod will be 
called affer the corresponding call tostartPrefixMapping, but otherwise the 
order of calls is not guaranteed. 

The method ignorableWhitespace(spaces) is called for a string spaces of 
whitespace. The method processi nginstructi on(target ,text) is called when 
a Processing instruction (other than an XML declaration) is encountered. The 
method ski ppedEnti ty( enti tyname ) is called when the parser skips any entity. 

A ContentHandler receives an Attributes object in calls to the startEl ement 
method. The Attributes object wraps a dictionary of attributes (keys) and their val- 
ues. The method gettength returns the number of attributes. The methods i tems, 
keys, kas_key,and values wrapthe corresponding dictionary methods. The 
method getValue(name) returns the value for an attribute name; If namespaces 
are active, the method getValueByQName(name) returns the value for a qualified 
attribute name. 

Example: blood-type extractor 

Listing 18-4 uses a SAX parser to extract a patienfs blood type from the same exam 
data XML uses in Listing 18-5 and Listing 18-6. 


Listing 18-4: BloodTypeSax.py 


import xml.sax 
import cStringlO 

SAMPLE_DATA = .<?xml version="1.0"?> 

<exam date="12/ll/99"> 

<patient>Pat</patient> 

<bloodtype>B</bloodtype> 

</exam >. 

class ExamHandler(xml.sax.ContentHandler): 

def _i ni t_(sel f): 

self.CurrentData="" 
self.B1oodType="" 
def characters(self,text): 

if self.CurrentData=="bloodtype": 
self.B1oodType+=text 

# We use the nonnamespace-aware element handlers: 

def StartElement(self,tag,attributes): 

self.CurrentData=tag 
def endElement(self,tag): 


Continued 
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Listing 18-4 (continued) 


if self.CurrentData=="bloodtype": 

print "Blood typeself.B1oodType 
self.CurrentData="" 

if (_natne_=="_tnain_"): 

# create an XMLReader 

MyParser = xml . sax.tnake_parser() 

# turn off namepsaces 

MyParser.setFeature(xml.sax.handler.feature_namespaces, 0) 

# override the default ContextHandler 
Handler=ExamHandler() 

MyParser.setContentHandler(Handler) 

# Build and parse an InputSource 
StringFile=cStringIO.StringIO(SAMPLE_DATA) 

MySource = xml.sax.InputSource("1") 

MySource.setByteStream(StringFi 1 e) 

MyParser.parse(MySource) 


Using parser (XMLReader) objects 

The base parser class is xml . sax. xml reader. XMLReader. It is normally not 
necessary to instantiate parser objects directly. However, you can access a parser 
to exercise tighter control on XML parsing. 

The method xml . sax.make_parser( [parserl i st] ) creates and returns an XML 
parser. If you want to use a specific SAX parser (such as Expat), pass the name of its 
module in the parserlist sequence. The module in question must define a 

create_parser function. 

Once you have an XML parser, you can call its method parse(source), where 
source is a filelike object, a URL, or a file name. 

An XML parser has properties and features, which can be set and queried by name. 
For example, the following lines check and toggle namespace mode for a parser: 

>>> MyParser=xml.sax.make_parser() 

>>> MyParser.getFeature(\ 

" http://xml.org/sax/features/namespaces") 

0 

>>> # Acti vate namespace Processing 

>>> MyParser.setFeature(\ 

" http://xml.org/sax/features/namespaces",1) 

The features and properties available vary from parser to parser. 
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An XMLReader has several helper classes. You can access the parser’s 
ContentHandler with the methods getContentHandler and 
setContentHandl er (Handl er ). Similarly, you can access the parser’s 
ErrorHandler (with getErrorHandl er and setErrorHandl er), its EntityResolver, 
and its DTDHandler. The helper classes let you customize the parser’s hehavior 
further. 

ErrorHandler 

An ErrorHandler implements three methods to handle errors: error, fatal Error, 
and warni ng. Each method takes a SAXParseException as its single parameter. 

DTDHandler 

A DTDHandler handles only notation declarations and unparsed entity declara- 
tions. The method notation Deci (natne,PublicID,SystetnID) is called when a 
notation declaration is encountered. The method 

unparsedEntityDecl (natne,PublicID,SystetnID,text) is called when an 
unparsed entity declaration is encountered. 

EntityResolver 

The XMLReader calls the EntityResolver to handle external entity references. The 
method resolveEntity(PublicID,SysteniID)is called for each such reference — 
it returns either the System identifier (as a string), or an InputSource. 

Locator 

Most XMLReaders supply a locator to their ContentHandler by calling its 
setDocumentLocator method. The locator should only be called by the 
ContentHandler in the context of a parsing method (such as characters). The 
locator provides the current location, via methods getCol utnnNutnber, 
getLineNuniber,getPublicId, and getSystemld. 

SAX exceptions 

The base exception is SAXExcepti on. It is extended by SAXParseExcepti on, 
SAXNotRecogni zedExcepti on, and SAXNotSupportedExcepti on. The construc- 
tors for SAXNotSupportedException and SAXNotRecognizedException take two 
parameters: an error string and (optionally) an additional exception object. The 
SAXParseExcepti on constructor requires these parameters, as well as a locator. 

The message and exception associated with a SAXExcepti on can be retrieved by 
the methods getMessage and getExcepti on, respectively. 


338 Partili -f Networking and the Internet 


Parsing XML with DOM 

The DOM API parses an entire XML document, and Stores a DOM (a tree representa- 
tion of the document) in memory. It is a very convenient way to parse, although it 
does require more memory than SAX. In addition, you can manipulate the DOM 
itself, and then write out the new XML document. This is a relatively painiess way 
to make changes to XML documents. 

A DOM is made up of nodes. Each element, each attribute, and even each comment 
is a node. The most important node is the document node, which represents the 
document as a whole. 

The module xml . dom. mi ni dom provides a simple version of the DOM interface. It 
provides two functions, parse( fi 1 e[, parser] ) or 

parseString(XML[,parser]),to parse XML and return a DOM. (Here parser, if 
supplied, must be a SAX parser object — mi n i dom uses SAX internally to generate 
its DOM.) 


DOM nodes 

A node object has a type, represented by tbe integer attribute nodeType. The valid 
node types are available as members of xml . dom .mi ni dom. Node, and Include 
D0CUMENT_N0DE,ELEMENT_NODE, ATTRIBUTE_N0DE, and TEXT_N0DE. 

A node can have a parent (given byits parentNode member), and a list of children 
(stored in its chi 1 dNodes member). You can add child nodes by calling 
appendChild(NewChild),orinsertBefore(NewChild,01dChild). You can also 
remove children by calling removeChild(OldChild). For example: 

>>> D0M=xml.dom.minidom.parse("Mystic Mafia.xml") # Build DOM 
>>> print DOM.parentNode # The document node has no parent 
None 

>>> print DOM.chi1dNodes 
[<D0M Element: rdf at 10070740>] 

>>> print DOM.chi1dNodesCO].chi 1 dNodes 

[<D0M Text node "\n">, <D0M Text node "\n">, <D0M Text node " 

">, <D0M Element: rdf:Deseription at 10052084>, <D0M Text node 
"\n">] 

Elements, attributes, and text 

An element bas a name, given by its member tagName. If the element is part of a 
namespace, p r e f i x holds its namespace’s name, 1 o c a 1 N a m e within the namespace, 
and namespaceURI is the URL of the namespace definition. You can retrieve an 
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element’s attribute values with the method getAttri bute ( Attri buteNatne ), set 
attribute values with setAttri bute (Attri buteNatne, Value), and remove 
attributes with the method remove At tribute (Attri buteNatne ). 

The text of an element is stored in a child node of type TEXT_N0DE. A text node has 
an attribute, data, containing its text as a string. 

For example, this code examines and edits an element: 

>>> print TagNode.tagName,TagNode.prefiX 
rdf:Deseription rdf 

>>> print TagNode.1ocalName,TagNode.namespaceURI 
Descriptior http://www.w3.org/1999/02/22-rdf-syntax-ns# 

>>> TagNode.getAttribute("type") # Value is Unicode 
u’catalog' 

>>> CNode.setAttribute("arglebargle", "test") 

>>> CNode.getAttribute("arglebargle") 

'test' 

>>> CNode . removeAttri buteCarglebargle") 

>>> # Getting a nonexistent attribute returns "" 

>>> CNode.getAttribute("arglebargl e") 


The document node (DOM) 

A document node, or DOM, provides a handy method, 

getEl ementsByTagName( Name ), which returns a list of all the element nodes with 
the specified name. This is a quick way to find the elements you care about, with- 
out ever iterating through the other nodes in the document. 

A DOM also provides methods to create new nodes. The method 

createEl ement (TagName) creates a new element node, createText Node (Text) 

creates a new text node, etc. The method toxml returns the DOM as an XML string. 

When you are finished with a DOM, call its method u n 1 i n k to clean it up. 

Otherwlse, the memory used by the DOM may not get garbage-collected until your 
program terminates. 

Example: data import and export with DOM 

XML is great for data interchange. Listlng 18-5 is an example of XML’s power: It 
exports data from a relational database to an XML file, and imports XML back into 
the database. It uses the mxODBC module for database access. This test code 
assumes the existence of an EMPLOYEE table (see Chapter 14 for the table’s 
definition, and more Information on the Python DB API). 
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Listing 18-5: XMLDB.py 


import xml.dom.mi nidom 

import ODBC.Windows # Replace for your OS as needed 
import sys 
import traceback 

IMPORTABLE_XML = .<?xml version="1.0"?><tabledata><row> 

<EMPL0YEE_ID>55</EMPL0YEE_ID><FIRST_NAME>Bertie</FIRST_NAME> 
<LAST_NAME>Jenkins</LAST_NAME><MANAGER_ID></MANAGER_ID> 
</row></tabledata>. 

def ExportXMLFromTable(Cursor): 

# We build up a DOM tree programatically, then 

# convert the DOM to XML. We never have to process 

# the XML string directly (Hooray for DOM!) 

D0M=xml .dom.minidom.Documenti) 

Tabi e EI ement=D0M.create EI ementi"tabiedata") 

DOM.appendChildiTableElement) 
w h i 1 e i 1): 

DataRow=Cursor.fetchonei) 

if DataRow==None: break # There is no more data 
RowElement=D0M.create EI ementi"row") 

Tabi e EI ement.appendChi1 diRowEl ement) 

for Index in rangei1 en i Cursor.deseription)): 

CoiumnName=Cursor.description[lndex][0] 

CoiumnElement=DOM.createElementiCol umnName) 

RowElement.appendChi1 di CoiumnEl ement) 

CoiumnValue=DataRow[Index] 
if iColumnValue): 

TextNode=DOM.createTextNodei\ 
striDataRow[Index])) 

CoiumnEI ement.appendChi1diTextNode) 
print DOM.toxml i) 

def ImportXMLToTableiCursor.XML.TableName): 

# Build up the SOL statement corresponding to the XML 
D0M=xml.dom.mi nidom.parseStringiXML) 

DataRows=DOM.getElementsByTagNamei"row") 

for RowElement in DataRows: 

InsertSOL="INSERT INTO %s i"%TableName 
for ChildNode in RowElement.chi1dNodes: 
if Chi 1dNode.nodeType==\ 

xml .dom.minidom.Node.ELEMENT_NODE: 

InsertSOL+="%s,"%Chi1dNode.tagName 
InsertSOL=InsertSOL[:-1] # Remove trailing comma 
InsertS0L+=") values i" 
for ChildNode in RowElement.chi1dNodes: 
if Chi 1dNode.nodeType==\ 

xml.dom.mi nidom.Node.ELEMENT_NODE: 

CoiumnValue=GetNodeTextiChi1dNode) 

InsertSOL+="%s,"%SOLEscapeiColumnVal ue) 
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InsertSQL=InsertSQL[:-1] # Remove trailing comma 
InsertSQL+=")" 

Cursor.execute(str(InsertSQL)) 

def SQLEscape(Value): 

if (Value in [None,""]): 

return "Null" 
el se: 

return %s'"%Value.repi ace(.,.) 

def GetNodeText(ElementNode): 

# Concatenate all text childnodes into one large string. 

# (The normalizeO method. available in version 2.1, makes 

# this a litti e easier by conglomerating adjacent 

# text nodes for us) 

NodeText="" 

for ChildNode in ElementNode.chi1dNodes : 

i f ChildNode.nodeType==xml .dom.mi nidom.Node.TEXT_N0DE: 
NodeText+=ChiIdNode.data 
return NodeText 

if (_name_=="_main_"): 

print "Testing XML export..." 

# Repi ace this line with your database connection info: 

Conn=0DBC.Windows.connect("AQUA","aqua"aqua") 
Cursor=Conn.cursori) 

Cursor.execute("select * from EMPLOYEE") 
print ExportXMLFromTable(Cursor) 

# Delete employee 55 so that we can import him again 

Cursor.execute("DELETE FROM EMPLOYEE WHERE\ 

EMPLOYEE_ID = 55") 
print "Testing XML import..." 

ImportXMLToTable(Cursor,IMPORTABLE_XML,"EMPLOYEE") 

# Remove this line if your database does not have 
transaction support: 

Conn.commit() 


Parsing XML with xmllib 


The module xml 1 i b defines a single class, XMLParser, whose methods are similar 
to that of html 1 i b . HTMLParser. You can detine start and end handlers for any tag. 
Listing 18-6 is a simple example that parses a patienfs blood type from examination 
data. 



Uniike xml.sax and xml.dom, xmllib doesn't require any extra modules to be built. 
Also, it is quite simple, and similar to htmilib. However, it is not a fast parser, and 
is deprecated as of Version 2.0. 
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This example Stores the blood type using one or more calls to handle_data. 
Strings may be passed to handl e_data all at once or in several pieces. 


Listing 18-6: BloodType.py 


import xml1 ib 

SAMPLE_DATA = .<?xnil version = " 1.0"?> 

<exatn date="5/13/99"> 

<patient>Pat</patient> 

<bloodtype>B</bloodtype> 

</exatn >. 

class ExatnParser(xtnllib.XMLParser): 

def _i ni t_(sel f): 

xml1 ib.XMLParser._i nit_(sel f) 

self.CurrentData="" # Track current data item 
self.B1oodType="" 
def start_bloodtypeCsel f,args): 

self.CurrentData="blood" 
def end_bloodtype(sel f): 

if (self.CurrentData=="blood"): 

print "Blood typeself.B1oodType 
self.CurrentData="" 
def handle_data(sel f,text): 

if (self.CurrentData=="blood"): 
self.B1oodType+=text 

if (_name_=="_main_"): 

MyParser = ExamParser() 

MyParser.feed(SAMPLE_DATA) 

MyParser.close() 


Elements and attributes 

The XMLParser attribute el ements is a dictionary of known tags. If you subclass 
XMLParser with a parser that handles a particular tag, then that tag should exist as 
akey in elements. The corresponding value is a tuple (StartHandler,EndHandler), 
where StartHandler and EndHandler are functions for handling the start and end of 
that tag. Normally, you don’t need to access el ements directly, as handlers of the 
form start_xxx and end_xxx are inserted automatically. 

The attribute attributes is a dictionary tracking the valid attributes for tags. The 
keys in attributes are known tags. The values are dictlonaries that map all valid 
attributes for the tag to a default value (or to None, if there is no default value). If 
any other attribute is encountered in parsing, the method syntax_error is called. 
By default, attri butes is an empty dictionary, and any attributes are permitted for 
any tag. 
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XML handiers 

XMLParser defines various methods to handle XML elements. These methods do 
nothing by default, and are intended to be overridden in a subclass. 

The method handl e_xtnl (encodi ng, standal one) is called when the <?xtnl ?> tag 
is parsed. The parameters encoding and standalone equal the corresponding 
attributes in the tag. 

The method handl e_doctype (root_tag, pubi i c_i d, sys_i d , data) is called 
when the <!DOCTYPE> tag is parsed. The parameters rootjag, publicjd, sysjd, and 
data are the root tag name, the DTD public identifier, the System identifier, and the 
unparsed DTD contents, respectively. 

The method handle_cdata(text) is called when a CDATA tag of the form 
<! CDATA[text] > is encountered. (Normal data is passed to handl e_data .) 

The method handl e_proc( name , text) is called when a Processing instruction of 
the form <?name text?> is encountered. 

The method handl e_speci al (text) is called for declarations of the form < ! text>. 

Other XMLParser members 

The method syntax_error (errormessage) is called when unparsable XML is 
encountered. By default, this method raises a RuntimeError exception. 

The method transi ate_references(text) translates all entity and character 
references in text, and returns the resulting string. 

The method getn ames pace returns a dictionary mapping abbreviation from the 
current namespace to URls. 


Summary 

You can easily parse HTML by subclassing the Standard parser. There are several 
varieties of parsers for XML, which you can customize to handle any kind of docu- 
ment. In this chapter, you: 

Parsed HTML with and without an output-formatter. 

4 Built a robot to automatically retrieve Web pages. 

Parsed and generated XML files for data exchange. 

In the next chapter, you’ll meet Tkinter, Python’s de facto Standard library for user 
interfaces. 
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Tinkering with 
Tkinter 


T kinter is a package used for building a graphical user inter- 
face (GUI) in Python. It runs on many operating Systems, 
including UNIX, Windows, and Macintosh. Tkinter is the de-facto 
Standard GUI lihrary for Python, and is often hundied with it. 

Tkinter is very easy to use; it is huiit on top of the high-Ievel 
scripting language Tcl. 


Cetting Your Feet Wet 

If you’re dying to see Tkinter in action, the program shown in 
Listing 19-1 should provide some instant gratification. It 
displays some text in a window. Notice how littie code it 
takes — such are the joys of Tkinter! 


Listing 19-1: HelloWorId.py 


import Tkinter 

# Create the root window: 

root=Tkinter.Tk() 

# Put a label widget in the window: 

LabelText="Ekky-ekky-ekky-ekky-z'Bang, zoom- 
Boing, \ 

z' nourrrwri ngrrirri" 

Label Wi dget=Tkinter.Label(RootWindow,text = Labe 
1Text) 

# Pack the label (position and display it): 

Label Widget.pack() 

# Start the event loop. This call won't return 

# unti1 the program ends: 

RootWindow.mainloop() 
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In This Chapter 
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Using common 
options 

Gathering user input 
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Using Tkinter dialogs 
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fonts 

Drawing graphics 
Using timers 

♦ ♦ ♦ ♦ 


Run the code, and youTI see something resemhling the screen- 
shot shown in Figure 19-1. 
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C:\P5;thon20\Bible20>cd .. 

C:\Python20>cd biblel9 
C:\Pvtbon20\Biblel9 >beIloWorld.py 

C:\Python20\Biblel9>copy helloworld.py helloworld.pyw 
1 file<s> copied. 

C:\Python20\Biblel9>belloworld.pyw 



Figure 19-1: Greetings from Tkinter 


/Note On Windows, Tkinter applications look more professional when you run them with 

^ pythonw.exe instead of python.exe. Giving a script a .pyw extension sends it to 

pythonw instead of python. Pythonw does not create a console window; the dis- 
advantage of this is that you can't see anything printed to sys.stdout and 
sys. stderr. 


Creating a GUI 

To use Tkinter, import the Tkinter module. Many programmers import it into the 
local namespace (from Tki nter import *); this is less expliclt, hut it does save 
some typing. This chapter’s examples do«Y import Tkinter into the local names¬ 
pace, in order to make it ohvious when they use Tkinter. 

Building an interface with widgets 

A user interface contains various widgets. A widget is an object dlsplayed onscreen 
with which the user can Interact. (Java calls such things components, and Microsoft 
calls them Controls.) Tkinter provides a hutton widget (Tki nter. Button), a lahel 
widget (Tki nter. Label), and so on. Most widgets are displayed on aparenf wid¬ 
get, or owner. The first argument to a widgefs constructor is its parent widget. 
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A Toplevel widget is a special widget with no parent; it is a top-level window in its 
own right. Most applications need only one Toplevel widget — the root widget 
created when you call Tki nter .Tk(). 

For example, a frame is a widget whose purpose in life is to contain other widgets. 
Putting related widgets in one frame is a great way to group them onscreen: 

MainWindow=Tkinter.Tk( ) # Create a top-level window 
UpperFratne=Tkinter.Fratne(MainWindow) 

# The label and the button both live inside UpperFrame: 

UpperLabe 1=Tkinter.Labe1(Frame) 

UpperButton=Tkinter.Button(Frame) 


Widget options 

Widgets have options (or attributos^ that control their look and behavior. Some 
options are used by many widgets. For example, most widgets have a background 
option, specifying the widgets normal background color. Other options are specific 
to a particular kind of widget. For example, a button widget has a command option, 
whose value is a function to call (without arguments) when the button is clicked. 

You can access options in various ways: 

# You can set options in the constructor: 

NewLabe1=Tkinter.Label(ParentFrame,background="gray50") 

# You can access options dictionarystyle (my favorite!) 

NewLabel["background"]="#FFFFFF" 

# You can set options with the config method: 

NewLabel.contig(background="bl ue") 

# You can retrieve an option's current value: 

CurrentColor=NewLabel ["background"] 

# Another way to get the current value: 

Cur rentCol 0 r=NewLabel.cgetC'background") 

A few option names are, coincidentally, reserved words in Python. When necessary, 
append an underscore to such option names: 

# "from" is a reserved word. Use from_ in code: 

VolumeWidget=Tkinter.Seal e(ParentFrame,from_=0,to=200) 

# Use "from" when passing the option name as a string: 

VolumeWidgetC"from"1=20 # "from_" is *not* ok here 

See “Using Common Options” for an overview of the most useful widget options. 


Laying Out Widgets 

The geometry manager is responsible for positioning widgets onscreen. The sim- 
plest geometry manager is the packer. The packer can position a widget on the left 
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(Tki nter . LEFT), right, top, or bottom side of its parents. You invoke the packer by 
calling the pack method on a widget. 

The grid geometry manager divides the parent widget into a grid, and places each 
child widget on a square of the grid. You invoke the grid geometry manager by 
calling the gri d( row=x, coi urrin=y ) method on a widget. Grid square numbering 
starts with 0. 

You can also position a widget precisely using place. However, using place is 
recommended only for perfectionists and masochists! If you use the placer, then 
whenever you add a widget to your design, you’ll need to reposition all the other 
widgets. 

Different geometry managers don’t get along well — if you pack one child widget 
and grid another, Tkinter may enter a catatonic state. You can use pack and grid 
in the same program, but not within the same parent widget! 

/Note Remember to call pack, grid, or place on every widget. Otherwise, the widget 

' will never be displayed, making it rather difficult to click on! 

Packer options 

Following are options you can pass to the pack method. These options overrlde the 
default packing. The default packing lays widgets out from top to bottom within 
their parent (si de=T0P). Each widget is centered within the available space 
(anchor=CENTER). It does not expand to fili its space (expand=N0), and it has no 
extra padding on the sides (padx=pady=0). 

side 

Passing a side option to p a c k places the widget on the specifled side of its parent. 

Valid values are LEFT, RIGHT, TOP, and BOTTOM. The default is TOP. If two widgets are 
both packed on one side of a parent, the first widget packed is the closest to the edge: 

Label l=Tki nter.Label(root,text="PackedLast") 

Label 2=Tkinter.Label(root,text="PackedFirst") 

Label2.pack(side=Tkinter.LEFT) # leftmost! 

Label 1.pack(side=Tkinter.LEFT) # Placed to the right of label2 

Mixlng LEFT/RIGHT with TOP/BOTTOM in one parent widget often yields creepy- 
looklng results. When packing many widgets, it’s generally best to use intermediate 
frame widgets, or use the grid geometry manager. 

fili, expand 

Pass a value of Y E S for expand to let a widget expand to fili all available space. Pass 
either X, Y, or BOTH for fili to specify which dlmensions will expand. These options 
are especlally useful when a user resizes the wlndow. For example, the following 
code creates a canvas that stretches to the edges of the window, and a status bar 
(at the bottom) that stretches horizontally: 
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DrawingArea=Tkinter.Canvas(root) 

DrawingArea.packC expand=Tkinter.YES,fi 11=Tkinter.BOTH) 
StatusBar=Tki nter.Label(root,text = "Ready.") 
StatusBar.pack(side=Tkinter.BOTTOM,expand=\ 
Tkinter.YES,fill=Tkinter.X) 


anchor 

If the widget has more screen space than it needs, the anc/?or option determines 
where the widget sits, within its allotted space. This does not affect widgets with 
fi 11 =B0TH. Valid values are compass directions (N, NW, W, SW, S, SE, E, NE) and 
CENTER. 

padx,pady 

These options give a widget some additional horizontal or vertical “elbow room.” 
Putting a little space hetween huttons makes them more readahle, and makes it 
harder to click the wrong one: 

Buttonl=Tkinter.Button(root,text="Fi re death ray", 
corritTiand=Fi reDeathRay) 

# 10 empty pixels on both sides: 

Buttonl.pack(side=Tkinter.LEFT,padx=10) 

Button2=Tkinter.Button(root,text="Send flowers", 

cotTiniand=PatTheBunny) 

# 10+10=20 pixels between buttons: 

Button2.pack(side=Tkinter.LEFT,padx=10) 

Grid options 

Following are options to pass to the grid method. You should specify a row and a 
column for every widget; otherwise, things get confusing. 


row, column 

Pass row and column options to specify which grid square your widget should live 
in. The numhering starts at 0; you can always add new rows and columns. For exam- 
ple, the following code lays out some buttons to look like a telephone’s dial pad: 

for Di git in range(9): 

Tkinter.Button(root,text= Digit+1).grid(row=Digit/3,\ 
coi utTin=Di gi t%3) 


sticky 

This option specifies which side of the square the widget should “stick to.” It is sim- 
ilar to anchor (for the packer). Valid values are compass directions and CENTER. You 
can comblne values to stretch the widget within its cell. For example, the following 
button filis its grid cell: 
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BigButton=TI<inter.Button(root,text="X") 

# Using "from Tkinter import *" would let this next line 

# be much less messy: 

BigButton.grid(row=0,colurrin=0,sti cky=TI<i nter. W+Tki nter. E+\ 
Tkinter.N+Tkinter.S) 


columnspan,rowspan 

These options let you create a big widget (one that spans multiple rows or 
columns). 


Example: Breakfast Buttons 

Listing 19-2 presents a beefier Tkinter program. It provides a food menu, with 
several buttons you can click to build up a complete breakfast. Your selection is 
displayed on a multiline label. Figure 19-2 shows the resulting user interface. 

This example initializes widgets in several different ways. In practice, you’ll want to 
do it the same way every time. (Personally, 1 like the pattern for the “Spam” button, 
and 1 hate the pattern for the “Beans” button.) 


Listing 19-2: FoodChoice.py 


import Tkinter 

# In Tkinter. a common practice is to subclass Tkinter.Frame, and make 

# the subclass represent "the application itself". This is 

# convenient (although. in some cases, the separation 

# between logic and UI should be clearer). FoodWindow is our application: 

class FoodWindow(Tkinter.Frame): 
def _init_(self): 

# Call the superclass constructor explicitly: 

Tkinter.Frame._i nit_(self) 

self.FoodItems=[] 
self.CreateChildWidgets() 
def CreateChi1dWidgets(self): 

ButtonFrame=Tkinter.Frame(self) 

# The fili parameter telis the Packer that this widget should 

# stretch horizontally to fili its parent widget: 

ButtonFrame.pack(side=Tkinter.TOP,fi 11=Tkinter.X) 

# Create a button. on the button frame: 

SpamButton=Tkinter.Button(ButtonFrame) 

# Button["text"] is the button label: 

SpamButton["text"]="Spam" 

# Button["command"] is the function to execute (without arguments) 

# when someone clicks the button: 

SpamButtonf"command"]=self.BuiIdButtonAction("Spam") 

SpamButton.pack(side=Tkinter.LEFT) 
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# You can specify most options by passing keyword-arguments 

# to the widgefs constructor: 

EggsAction=self.Bui1dButtonAction("Eggs") 

EggsButton=Tki nter. Button(ButtonFrame, text=" Eggs", comtnand=Eggs Acti on) 

# This is the second widget packed on the LEFT side of ButtonFrame, so 

# it goes to the right of the "Spam" button: 

EggsButton.pack(side=Tkinter.LEFT) 

# Some people 1 ike to do everything all in one go: 

Tkinter.Button(ButtonFrame,text="Beans", \ 

command=sel f.Bui 1 dButtonAction("Beans")).pack(side=Tkinter.LEFT) 

# You can also set widget options with the "config" method: 

SausageButton=Tkinter.Button(ButtonFrame) 

SausageAction=self.Bui1dButtonAction("Sausage") 

SausageButton.config(text="Sausage",command=SausageAction) 
SausageButton.pack(side=Tkinter.LEFT) 

# It's often good for parent widgets to keep references to their 

# children. Here, we keep a reference (self.FoodLabel) to the label, so 

# we can change it later: 

self.FoodLabel“Tkinter.Label(self, wraplength=190,\ 

rei ief=Tkinter.SUNKEN.borderwidth=2,text="") 
self.FoodLabel.pack(side=Tkinter.BOTTOM,pady=10,fi11=Tkinter.X) 

# Packing top-level widgets last often saves some repainting: 

self.pack() 

def ChooseFood(self,Fooditem): 

# Add Fooditem to our list of foods. and buiId a ni ce 

# string listing all the food choices: 

self.Fooditems.appendiFooditem) 

LabelText="" 

Totalltems=len(self.Fooditems) 
for Index in rangelTotalItems): 
if (IndexSO): 

LabelText+=", " 

if (TotalItemsSl and Index==TotalItems-1): 

LabelText+="and " 

LabelText+=self.Fooditems[Index] 
self.FoodLabel["text"]=LabelText 

# Lambda forms are a convenient way to define commands, especially when 

# several buttons do similar things. I put the 1ambda-construction in its 

# own function. to prevent duplicated code for each button: 
def Bui1dButtonAction(self,Label): 

# Note: Inside a lambda definition. you can't see any names 

# from the enclosing scope. So, we must pass in self and Label: 

Action=lambda Food=self,Text=Label: Food.ChooseFood(Text) 
return Aetion 

if (_name_=="_main_"): 

MainWindow=FoodWindow() 

MainWindow.mainloop() 
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Figure 19-2; Respondingto buttons 


Using Common Options 

The following sections provide an overview of the most commonly used widget 
options, organized by category. Those options that apply to button widgets also 
apply to check button and radio button widgets. 


Color options 

The following options control the colors of a widget: 


background, foreground 

activebackground, 

activeforeground 


Background and foreground colors. A synonym for 
background is bg; a synonym for foreground is fg. 

For a button or menu, these options provide 
colors used when the widget is active. 


disabledforeground Alternative foreground color for a disabled button 

or menu. 


selectforeground, Alternative colors for the selected element(s) of a 

selectbackground Canvas, Entry, Text, or Li stbox widget. 


highlightcolor, Colors for the rectangle around a menu. 

highlightbackground 
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Size options 

The following options govern the size and shape of a widget. 


width 

Widget width, as measured in average-sized characters of the 
widgefs font. A value of 0 (the default) makes the widget just 
large enough to hold its current text. 

height 

Widget height, as measured in average-sized characters. 

padx, pady 

Amount of extra internal horizontal or vertical padding, in 
pixels. Generally ignored if the widget is displaying a bitmap 
or image. 


Appearance options 

The following options, together with the color and size options, control a widgefs 
appearance: 


text 

Text to display in the widget. 

image 

Image for display in a button or label. If an image is supplied, 
any text option is ignored. Pass an empty string for i mage to 
remove an image. 

relief 

Specifies a 3-D border for the widget. Valid values are FLAT, 
GROOVE, RAISED, RIDGED, SOLID, and SUNKEN. 

borderwidth 

Width of the widgefs 3-D border, in pixels. 

font 

The font to use for text drawn inside the widget. 


Behavior options 

The following options affect the behavior of a widget: 


command 

Specifies a function to be called, without parameters, when 

state 

the widget is clicked. Applies to buttons, scales, and scroll- 
bars. 

Sets a widget state to NORMAL, ACTIVE, or DI SABLED. A DI S- 
ABLED widget ignores user input, and (usually) appears 
grayed-out. The ACTIVE state changes the widgefs color 
(using the activebackground and activeforeground colors). 

underline 

Widgets can use keyboard shortcuts. The underl i ne option 
is the index of a letter in the widgefs text; thls letter becomes 
the “hot key” for using the widget. 

takefocus 

If true, the widget is part of the “tab order”—when you cycle 
through widgets by hitting Tab, this widget will get the focus. 
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Gathering User Input 

Many widgets collect input from the user. For example, the Entry widget enables 
the user to enter a line of text and the Checkbox widget can be switched on and off. 
Most such widgets store their value in a Tkinter variable. Tkinter variable classes 
include Stri ngVar, IntVar, Doubl eVar, and Bool eanVar. Each Tkinter variable 
class provides set and get methods to access its value: 

>>> Text=Tkinter.StringVar() 

>>> Text.get() 

>>> Text.set("Howdy!") 

>>> Text.get() 

'Howdy! ' 

You hook a widget to a variable by setting one of the widgefs options. A check but- 
ton generally usesaBooleanVar, attached using the variable option: 

SmokingFlag=BooleanVar() 

Bl=Checkbutton(ParentFrame,text="Smoking",vari abie=SmokingFl ag) 

# This line sets the variable *and* checks the Checkbutton: 

SmokingFlag.set(1) 

The Entry and Opti onMenu widgets generally use aStringVar, attached using a 
textvari abi e option: 

# PetBunnyName.getO and NameEntry.get() wi11 both 

# return the contents of the entry widget: 

PetBunnyName=StringVar() 

NameEntry = Entry(ParentFrame,text="Bubbles", 
textvariabie=PetBunnyName) 

ChocolateName=StringVar() 

FoodChoi ce=0ptionMenu(ParentFrame,ChocolateName, 

"Crunchy Frog","Spri ng Surpri se","Anthrax Ripple") 

Several Radiobutton widgets can share one variable, attached to the variable 
option. The value option Stores that button’s value; 1 like to make the value the 
same as the radio button’s label: 

FI avor=StringVarC) 

Chocolate=Radiobutton(ParentFrame,variable=Flavor, 
text="Chocolate",value="Chocolate") 

Strawberry=Radiobutton(ParentFrame,vari abie=Flavor, 
text="Strawberry",value="Strawberry") 
Albatross=Radiobutton(ParentFrame,variable=Flavor, 
text="Albatross",value="Albatross") 
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Somewidgets, such as Listbox andText, use custom methods (nof Tkinter vari- 
ables) to access their contents. Accessors for these widgets are described together 
with the widgets. 


Example: Printing Fancy Text 

The program in Listing 19-3 can print text in various colors and sizes. It uses vari¬ 
cus widgets, attached to Tkinter variables, to collect user input. Figure 19-3 shows 
the program in action. 


Listing 19-3: Userinput.py 


import Tkinter 

itnport tkFont # the Font class lives here! 

class MainWindow(Tkinter.Frame): 

def _init_(self): 

Tkinter.Frame._i nit_(self) 

# Use Tkinter variables to hold user input: 
self.Text=Tkinter.StringVar() 

sel f. Coi orNatrie=Tki nter.StringVar() 
self.BoldFlag=Tkinter.BooleanVar() 
self.UnderlineFl ag=Tki nter. Bool eanVar() 
self.FontSize=Tkinter.IntVar() 

# Set some default values: 

self.Text.set("Ni! Ni! Nil") 
self.FontSize.set(12) 
self.CoiorName.set("black") 
self.TextItem=None 

# Create all the widgets: 
self.CreateWidgets() 

def CreateWidgets(self): 

# Let the user specify text: 

TextFratrie=Tki nter.Frame (self) 

Tkinter.Label(TextFrame,text="Text:").pack(side=Tkinter.LEFT) 

Tkinter.Entry(TextFrame.textvari abie=self.Text).pack(side=Tkinter.LEFT) 
TextFrame.pack() 

# Let the user select a color: 

CoiorFrame=Tkinter.Frame(self) 

Coiors=["black","red","green", "blue","deeppink"] 

Tkinter.Label (CoiorFrame,text="Color:").pack(side=Tkinter.LEFT) 

Tkinter.OptionMenu(ColorFrame,self.CoiorName,"white",*Colors).pack(\ 
side=Tkinter.LEFT) 

CoiorFrame.pack() 


Continued 
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Listing 19-3 (continued) 


# Let the user select a font size: 

Si zeFratne=Tki nter.Frame(self) 

Tki nter. Radi obutton( Si zeFratne, text="Stnal 1", vari abi e=sel f. FontSi ze, 
value=12).pack(side=Tkinter.LEFT) 

Tki nter. Radi obutton( Si zeFratne, text="Medi um", vari abi e=sel f. FontSi ze, 
value=24).pack(side=Tkinter.LEFT) 

Tki nter. Radi obuttonl Si zeFratne, text=" Large", vari abi e=sel f. FontSi ze, 
value=48).pack(side=Tkinter.LEFT) 

Si zeFratne. pack() 

# Let the user turn Bold and Underline on and off: 

Styl eFratne=Tki nter. Fratnel sel f) 

Tki nter. Checkbuttonl Styl eFratne, text="Bol d", vari abi e=\ 
self.BoldFlag).pack(side=Tkinter.LEFT) 

Tki nter. Checkbuttonl Styl eFratne , text="Under 1 i ne", vari abi e=\ 
self.UnderlineFlag).pack(side=Tkinter.LEFT) 

Styl eFratne. pack() 

# Add a button to repaint the text: 

GoFrattte=Tki nter.Frattte(self) 

Tki nter. ButtonlGoFrame, text="Go!" ,conitnand=sel f. Pai ntText). pack() 
GoFrante. pack( anchor=Tki nter.W,fill=Tkinter.X) 

# Add a canvas to display the text: 

self.TextCanvas=Tkinter.Canvaslself,height=100,width=300) 
self.TextCanvas.packlside=Tkinter.BOTTOM) 

# Pack parent-most widget last: 
self.pack() 

def PaintTextlself): 

# Erase the old text, if any: 
if (sel f .TextIteni!=None): 

sel f .TextCanvas .dei et e (sel f. Text Itent) 

# Set font weight: 

if (sel f. BoldFl ag.get()): 

FontWeight=tkFont.BOLD 
el se: 

FontWeight=tkFont.NORMAL 

# Create and configure a Font object. 

# (Use tkFont.families(self) to get a list of available font-fami1ies) 

TextFont=tkFont.Font(self,"Courier") 

TextFont.configure(size=self.FontSize.get(), 

underline=self.UnderlineFlag.get(), weight=FontWeight) 

sel f .Text Itetn=sel f .TextCanvas . create_text (5,5, anchor=Tki nter. NW, 
text=sel f .Text .get(),fill=self.Col orNatne. get(), font=TextFont) 

if (_nante_=="_main_"): 

App=MainWindow() 

App.tttainloop() 
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Figure 19-3: Printing fancy text 


Using Text Widgets 

The text widget (Tkinter.Text) is a fancy, multiline text-editing widget. It can 
even contain embedded Windows and graphics. It is an Entry widget on steroids! 

The contents of a text widget are indexed by line and column. A typical index has 
the form n . tn, denoting character m in line n. For example, 5 .8 would be character 
8 from line 5. The first line of text is line 1, but the first character in a line has col- 
umn 0. Therefore, the beginning of a text widget has index 1.0. You can also use the 
special indices END, INSERT (the insertion cursor’s location), and CURRENT (the 
mouse pointer’s location). 

You can retrieve text from a text widget via its method get(start[,end]). This 
returns the text from index start up to (but not including!) index end. If end is 
omitted, get returns the single character at index start: 

TextWidget.get(" 1 .0",Tkinter.END) # Get ALL of the text 

TextWidget.get("3.0", "4.0") # Get line 3 

TextWi dget . get (" 1.5" ) # get the 6^^^ character only 
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The method delete(start[,end]) deletes text from the widget. The indexes start 
and end function as they do for the get method. The method i nsert ( pos , str) 
inserts the string sfr just before the index pos\ 

TextWidget.insert("1.0","Bob") # Prepend Bob to the text 
TextWidget.insert(Tkinter.END,"Bob") # Append Bob to the text 

# insert Bob wherever the mouse is pointing: 
TextWidget.insert(TI<inter.CURRENT,"Bob") 

# Ciear the widget (remove all text): 

TextWidget.deiete("1.0",Tki nter.END) 


Building Menus 

To build a menu in Tkinter, you use a menu widget (Tki nter. Menu). You then flesh 
out the menu by adding entries. The method add_command( 1 abel=?,command=?) 
adds a menu line with the specified label. When the user chooses the menu line, the 
specified command is executed. add_separator adds a separator line to a menu, 
suitable for grouping commands. 

A call to add_cascade( 1 abel = ?,menu=?) attaches the specified menu as a sub- 
menu of the current menu. And add_checkbutton (1 abel =? [,...]) adds a check 
button to the menu. You can pass other options for the new Checkbutton widget 
(such as vari abi e) to add_checkbutton. 

Create one instance of Menu to represent the menu bar itself, and then create one 
Menu instance for each “real” menu. Unlike most widgets, a menu is never packed. 
Instead, you attach it to a window using the menu option of a TopLevel widget, as 
shown in the following example: 

root=Tkinter.Tk() 

MenuBar=Tkinter.Menu(root) # Menu bar must be child of Toplevel 
root["menu"]=MenuBar # attach menubar to window! 

Fi 1eMenu=Tkinter.Menu(MenuBar) # Submenu is child of menubar 
Fi 1eMenu.add_command(1abel = "Load",command=LoadFi 1 e) 

Fi 1eMenu.add_command(1abel = "Save",command=SaveFi 1 e) 

HeipMenu=Tkinter.Menu(MenuBar) 

HeipMenu.add_command(1abel="Contents",command=Helplndex) 

# Attach menus to menubar: 

MenuBar.add_cascade(1 abel = "Fi 1 e",menu = Fi 1 eMenu) 

MenuBar.add_cascade(1 abel = "Hei p",menu=Hel pMenu) 

You can create pop-up menus in Tkinter. Call the menu method 
tk_popup( X ,y [ .default]) to bring a menu up as a pop-up. The pop-up is posi- 
tioned at (x,y). If default is supplied, the pop-up menu starts with the specified label 
selected, as shown in Listing 19-4: 
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Listing 19-4: Popup.py 


import Tkinter 
def MenuComtnand(): 

print "Howdy!" 
def ShowMenuC): 

PopupMenu.tk_popup(*root.winfo_poi nterxy ()) 
root=Tkinter.Tk() 

PopupMenu=Tkinter.Menu(root) 

PopupMenu . add_cotnrriand (1 abel = " X", command=MenuCorritnand) 
PopupMenu.add_command(1abel="Y",command=MenuCommand) 
Tkinter.Button(root,text="Popup",command=ShowMenu).pack() 
root.mainloop() 


Using Tkinter Dialogs 

The module tkMessageBox provides several functions that display a pop-up 
message box. Each takes title and message parameters to control the window’s 
title and the message displayed. 


Table 19-1 

Message Boxes 

Function 

Description 

showinfo 

Shows an informational message. 

showwarning 

Displays a warning message. 

showerror 

Displays an error message. 

Askyesno 

Displays Yes and No buttons. Returns true if the user chose Yes. 

AskokcanceI 

Displays OK and CanceI buttons. Returns true if the user chose OK. 

Askrettycancel 

Displays Retry and CanceI buttons. Returns true if the user chose Retry. 

Askquestion 

Same as askyesno, but returns Yes or No as a string. 


This snippet of code uses tkMessageBox to get user confirmation before quitting: 

def Quit(self): 

if self.Fi 1eModified: 

if (not tkMessageBox.askyesno("Confirm",\ 

"File modified. Really quit?"): 
return # don't quit! 


sys.exit() 
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File dialogs 

The module tkFi 1 eDi al og provides functions to bring up flle-selection dialogs. 
The function askopenfile lets the user choose an existing file. The function 
asksaveasfilenatne lets the user choose an existing f ile or provide a new file 
name. Both functions return the full path to the selected file (or an empty string, if 
the user cancels out). 

Optionally, pass a filetypes parameter to either function, to limit the search to par- 
ticular file types. The parameter should be a list of tuples, wbere each tuple has the 
form (description,extension): 

Musi cFi 1 eNatne=tkFi 1 eDi al og. askopenf i 1 enatnef 
fi 1 etypes=[ ("Musi c f i 1 es", "tnp3") ]) 


Example: Text Editor 

The example in Listing 19-5 is a simple text editor. With it, you can open, save, and 
edit text files. The code illustrates the use of the text widget, Tkinter menus, and 
some of Tkinter’s Standard dialog boxes. Figure 19-4 shows what the text editor 
looks like. 


Listing 19-5: TextEditor.py 


import Tkinter 
itnport tkFileDialog 
import tkMessageBox 
import os 
import sys 

# Filetype selections for askopenfil ename and asksaveasfi 1 ename: 
TEXT_FILE_TYPES=[("Text fi 1 es"."txt"),("Al 1 files".)] 

class TextEditor: 

def _init_(sel f): 

self.Fi 1eName=None 
self .CreateWidgets() 
def CreateWidgets(sel f): 
self.root=Tkinter.Tk() 
self. root.title("New file") 

MainFrame=Tkinter.Frame(self.root) 

# Create the File menu: 

MenuFrame=Tkinter.Frame(self.root) 
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MenuFratne. pack(si de=Tki nter .TOP, f i 11 =Tki nter. X) 

Fi 1eMenuButton=Tkinter.Menubutton(MenuFrame, 
text="Fi 1 e",under 1 ine=0) 

Fi 1 eMenuButton.pack(side=Tkinter.LEFT,anchor=Tkinter.W) 

Fi 1eMenu=Tkinter.Menu(Fi 1eMenuButton,tearoff=0) 

Fi 1 eMenu.add_command(1abel="New",underline=0, 
comtnand=sel f .C1 earText) 

Fi 1eMenu.add_command(1abel="0pen",underline=0,command=self.Open) 

Fi 1eMenu.add_command(1abel="Save",underline=0,command=self.Save) 

Fi 1 eMenu.add_command(1abel="Save as ...",underline=5, 
command=self.SaveAs) 

Fi 1eMenu.add_separator() 

self.FixedWidthFlag=Tkinter.BooleanVar() 

Fi 1 eMenu.add_checkbutton(1abel="Fixed-width", 

variable=self.FixedWidthFlag,command=self.SetFont) 

Fi 1 eMenu.add_separator() 

Fi 1eMenu.add_command(1abel="Exit",underline=l,command=sys.exit) 

Fi 1eMenuButton["menu"] = Fi1eMenu 

# Create Help menu: 

HeipMenuButton=Tkinter.Menubutton(MenuFrame,text="Heip".underline=0) 
HeipMenu=Tkinter.Menu(HeipMenuButton,tearoff=0) 

Hei pMenu.add_command(1abel="About",underline=0,command=self.About) 
HeipMenuButton["menu"]=HelpMenu 

HeipMenuButton.pack(side=Tkinter.LEFT,anchor=Tkinter.W) 

# Create the main text field: 

self.TextBox=Tkinter.TextlMainFrame) 

self.TextBox.pack(fi 11=Tkinter.BOTH,expand=Tkinter.YES) 

# Pack the top-level widget: 

MainFrame.pack(fi 11=Tkinter.BOTH,expand=Tkinter.YES) 
def SetFont(self): 

if (self.FixedWidthFlag.get()): 

self.TextBoxF"font"]="Courier" 
el se: 

self.TextBox["font"]="Helvetica" 
def Aboutlself): 

tkMessageBox.showinfol "About textpad.. .", "Hi , Tm a textpad!") 
def ClearTextlself): 

self.TextBox.deiete("1.0".Tkinter.END) 
def Open(self): 

Ei 1eName=tkFi1eDialog.askopenfi 1ename(fi 1etypes=TEXT_FILE_TYPES) 
if (Fi 1eName==None or Fi 1eName==""): 
return 

try: 

Fi 1e=open(Fi 1eName,"r") 

NewText=File.read() 

File.close() 

self.Fi 1eName=Fi1eName 

self.root.title(Fi 1eName) 


Continued 
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Listing 19-5 (continued) 


except lOError: 

tkMessageBox.showerror("Read error., 

"Could not read from '%s'"%FileName) 
return 

self.ClearText() 

sel f .TextBox. insertdkinter. END, NewText) 
def Save(self): 

if (sel f. Fi 1 eName==None or sel f. Fi 1 eNattie==""): 

self.SaveAs() 
el se: 

self.SaveToFile(self.FileName) 
def SaveAs(self): 

Fi 1 eName=tkFi leDialog.asksaveasfil enatrie(fi 1 etypes=TEXT_FI LE_TYPES) 
if ( Fi 1 eName==None or Fi 1 eNatne==""): 
return 

self.SaveToFile(FileName) 
def SaveToFi1e(self,Fi 1eName): 
try: 

Fi 1e=open(Fi 1eName,"w") 

NewText=self.TextBox.get("1.0".Tkinter.END) 

File.write(NewText) 

File, close() 
self.Fi 1eName=Fi1eName 
self.root.title(Fi 1eName) 
except lOError: 

tkMessageBox.showerror("Save error., 

"Could not save to '%s'"%Fi1eName) 
return 
def Run(self): 

self.root.mainloop() 

if (_name_=="_main_"): 

TextEditor().Run() 
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Figure 19-4; A text editor with dialogs 


Handiing Colors and Fonts 

You can customize the color (or colors) of your widgets, as well as the font used to 
paint widget text. 

Colors 

Colors are defined using three numbers. The three numbers specify the intensity of 
red, green, and blue. Tkinter accepts colors in the form of a string of the form #RGB, 
or #RRGGBB, or #RRRGGGBBB. For example, #FFFFFF is white, #000000 is black, and 
#FF00FF is purple. The longer the string, the more precisely one can specify colors. 

Tkinter also provides many predefined colors — for example, red and green are 
valid color names. The list also includes some exotic colors, such as thistleS and 
burlywood2. 
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Fonts 

Pont descriptors are tuples of the form (fatnily,size[,styles]). For example, 
the following lines display a button whose label is in Helvetica 24-point italics: 

root=TI<i nter.Tk() 

Tkinter.Button(root,text="Fancy", 

font=("Heivetica",24,"italic")).pack() 

If the name of a font family does not contain spaces, a string of tbe form " f atni 1 y 
si ze styl es " is an equivalent font descriptor. You can also use X font descriptors: 

Tkinter.Button(root,text="Fixed-width", 

font="-*-Courier-bold-r-*-*-12-*-*-*-*-*-*-*').pack() 


Drawing Graphics 

The Photolmage class enables you to add images to your user interface. Images in 
GIF, PPM, and PGM format are supported. The constructor enables you (optlonally) 
to name the image. You can also specify a file to read the image from, or pass in raw 
image data: 

MisterT=PhotoImage("Mr. T",fi 1e="mrt.gif") 

# Another way to get the same image: 

ImageFile=open("mrt.gif") 

ImageData = ImageFi1 e.read() 

ImageFi1 e.close() 

MisterT=PhotoImage(data=ImageData ) # no name 

Once you have a Photolmage object, you can attach it to a label or button uslng tbe 
image option: 

MisterT=Tkinter.PhotoImage(file="mrt.gif") 

Tki nter.Button(root,image=MisterT).pack() 

You can query tbe size of a Photolmage using the wi dth and hei ght methods. 

/Note You can construet Photolmage objects oniy after you instantiate a TopLevel 
' ^ instance. 

The canvas widget 

The canvas widget (Tki nter . Can vas) is a window in which you can programmati- 
cally draw ovals, rectangles, lines, and so on. For example, the following code 
draws a smiley-face: 

Fi gure=Tkinter.Canvas(root,width=50,height=50) 

Fi gure.pack() 
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Fi gure.create_line(10,10,10,20) 

Fi gure.create_line(40,10,40,20) 

Fi gure.create_arc(5,15,45,45,start=200,extent = 140, 
style=Tkinter.ARC) 


Several different canvas items are available for your drawing pleasure: 


create_lineCxl,yl,x2, 
y2.xn.yn) 


create_polygon(xl,y2, 
x2 ,y2.xn ,yn) 


Draws lines connecting the points (xl,yl) 
through (xn,yn), in order. The lines are nor- 
mally straight; set the smooth option to true 
to draw smooth lines. 

Similar to create_l i ne. Filis the area 
spanned by the lines with the color supplied 
for the fili option (by default, “transparent”). 
Pass a color for the outline option to control 
the line color. Set the smooth option to true 
to draw smooth lines. 


create_image(x,y, 
image=? [, anchor=?]) (x,y). 


create_oval(xl,yl,x2,y2) 


create_rectangle 
(xl,y2,x2,y2) 


Draw the specified image on the canvas at 
The image option can be either a 
Photolmage instance or the name of a previ- 
ously created Photolmage. The anchor 
option, which defaults to CENTER, specifies 
which portion of the image lies at (x,y). 

Draw an oval inside the rectangle defined by 
the points (xl,yl) and (x2,y2). Pass a color 
in the outline option to control the outline’s 
color. Pass a color in the fili option to fili the 
oval with that color. You can control the out- 
line’s width (in pixels) with the width option. 

Draw a rectangle. The fili, outline, and 
width options have the same effect as for 

create_oval. 


create_text(x,y,text=? 
[,font=?]) 


Draw the specified text on the canvas. Uses 
the supplied font, if any. 


Manipulating canvas items 

The items drawn on a canvas are widgets in their own right — they can be moved 
around, have events bound to them, and so on. The create_* methods return an ID 
for the canvas item. You can use that ID to manipulate the canvas item, using the 
canvas’s methods. For example, the canvas method dei ete ( ID ) deletes the 
specified item. The method move (ID, DeltaX, DeltaY) moves the canvas item 
horizontally hy DeltaX units, and vertically hy DeltaY units. 
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Using Timers 

Tkinter also provides a timer mechanism. Call the method after(wait,function) 
on a TopLevel widget to make the specified function execute after wait millisec- 
onds. To make a timed action recur (for example, once every five minutes), make 
another call to after at the end of function. For example, the code in Listing 19-6 
calls a function every ten seconds: 


Listing 19-6: Timer.py 


import Tkinter 

def MinuteElapsed(): 
print "Ding!" 

root.after(1000*60,MinuteElapsed) 

root=Tkinter.Tk() 

root.after(10000,MinuteElapsed) 

root.mainloop() 


Example: A Bouncing Picture 

The program in Listing 19-7 displays a picture that moves around, bouncing off the 
sides of the window, as shown in Figure 19-5. It uses a Photolmage object and a 
canvas to handle the display and the TopLevel after method to schedule calls to 

Movelmage. 


Listing 19-7: CanvasBounce.py 


import Tkinter 
class Bouncer: 

def _init_(sel f,Master): 

self.Master=Master 

self.X=0 

self.Y=0 

self.DeltaX=5 

self.DeltaY=5 

self.Figure=Tkinter.Canvasfself.Master) 
sel f.Grai 1 Width=Grai 1 Picture.width() 
sel f.Grai 1 Height=Grai 1 Picture.height() 
self.GrailID=self.Figure.create_image( 

0,0,anchor=Tkinter.NW,image=GrailPicture) 
self.Figure.pack(fill=Tkinter.BOTH,expand=Tkinter.YES) 
# Move the image after 100 mi 11 iseconds: 
root.after(100,self.Movelmage) 
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def Movelmage(self): 

# Move the image: 
self.X+=self.DeltaX 
self.Y+=self.DeltaY 

self.Figure.coords(self.GrailID,self.X,self.Y) 

# Bounce off the sides: 

if (self.X<0): 

self.DeltaX=abs(self.DeltaX) 
if (self.Y<0): 

self.DeltaY=abs(self.DeltaY) 
if (sel f.X+sel f.Grai1Width>sel f.Fi gure.winfo_width()): 

self.DeltaX=-abs(self.DeltaX) 
if (self.Y+self.Grai 1 Height >\ 
self.Figure.winfo_height()): 
self.DeltaY=-abs(self.DeltaY) 

# Do it again after 100 mi 11 iseconds: 
self.Master.after(100,self.MoveImage) 

if (_name_=="_main_"): 

root=Tkinter.Tk() 

Grai1Picture=Tkinter.PhotoImage(file="HolyGrail.gif") 
Bouncer(root) 
root.mainloop() 



Figure 19-5: A bouncing picture 
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Summary 

After working with Tkinter, you will understand why it is so popular. Creating and 
customizing an interface is simple. In this chapter, you: 

Created a GUI with buttons, labeis, menus, and other Tkinter widgets. 

-f Used Tkinter’s Standard dialogs. 

-f Set up timers. 

-f Drew pictures on a canvas. 

The next chapter delves Into Tkinter in more detall. It covers events, drag-and-drop 
operations, and some more widgets. 

> > -f 


Using Advanced 
Tkinter Widgets 


C H 


P T|E R 




T his chapter introduces some of Tkinter’s fancier features — 
custom event handlers, advanced widgets, and more. 
Tkinter scales up painlessly from quick-and-dirty interfaces to 
sophisticated, full-featured applications. 


Handiing Events 

A GUI program spends most of its time waiting for something 
to happen. When something does happen — the user clicking 
the mouse, for example — events are sent to the affected wid- 
get(s). Events are sometimes called messages or notifications. 
A widget responds to an event using a function called an event 
handler. 

Creating event handlers 

Often, Tkinter’s Standard event handlers are good enough. As 
you saw in the last chapter, you can create an interesting UI 
without ever writing event handlers. However, you can 
always define a custom event handler for a widget. To define 
a custom handler, call the widget method bincKEventCode, 
Handler[,Add=None]). Here, EventCode is a strlng identify- 
ing the event, and Handler is a function to handle the event. 
Passing a value of + for Add causes the new handler to be 
added to any existing event binding. 

You can also bind event handlers for a particular widget class 
with a call to bi nd_cl assfClassNa me, EventCode, 
Handler[,Add]),or bind event handlers for application-level 
events with bi nd_al 1 (EventCode,Handler[, Add]). 

When the widget receives a matching event, Handler is called, 
and passed one argument — an event object. For example, the 
following code creates a label that beeps when you click it: 


> ♦ ♦ ♦ 
In This Chapter 

Handiing events 

Advanced widgets 

Creating dialogs 

Supporting drag-and- 
drop operations 

Using cursors 

Designing new 
widgets 

Further Tkinter 
adventores 
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BeepLabel =TI<i nter . Label (root, text="Cl i ck me !") 
BeepHandler=lambda Event,Root = root:Root.bel1() 
BeepLabel.bind("<Button-1>",BeepHandler) 
BeepLabel.pack() 


Binding mouse events 

Mouse buttons are numbered— 1 is the left button, 2 is the middle button (if any), 
and 3 is the right button. Table 20-1 lists the available mouse event codes. 



Table 20-1 

Mouse Events 

Event code 

Description 

<Button-l> 

Button 1 was pressed on the widget. Similarly for <Button-2> 
and <Button-3>. 


<Bl-Motion> The mouse pointer was dragged over the widget, with button 1 

pressed. 

<ButtonRelease-l> Button 1 was released over the widget. 

<Double-Button-l> Button 1 was double-clicked over the widget. 


Binding keyboard events 

The event code <Key> matches any keypress. You can also match a particular key, 
generally by using that key’s character as an event code. For example, the event 
code X matches a press of the x key. Some keystrokes have special event codes. 
Table 20-2 lists the event codes for some of the most common special keystrokes. 


Table 20-2 

Common Special Keystrokes 

Event code 

Keystroke 

<Up> 

Up arrow key 

<Down> 

Down arrow key 

<Left> 

Left arrow key 

<Right> 

Right arrow key 
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Event code 

Keystroke 

<F1> 

Function key 1 

<Shift L>,<Shift_R> 

Left and right Shift key 

<Control L>,<ControLR> 

Left and right Control key 

<space> 

Spacebar 


Event objects 

An event object, as passed to an event handler, has various attributes that specify 
just what happened. The attribute wi dget is a reference to the affected widget. 

For mouse events, the attributes x and y are the coordinates of the mouse pointer, 
in pixels, as measured from the top-left corner of the widget. The attributes x_root 
and y_root are mouse pointer coordinates, as measured from the top-left corner of 
the screen. 

For keyboard events, the attribute char is the character code, as a string. 


Example: A Drawing Canvas 

The program in Listing 20-1 provides a canvas on which you can draw shapes by 
left- and right-clicking. In addition, you can move the Quit button around by using 
the arrow keys. Flgure 20-1 shows the program in action. 


Listing 20-1: Events.py 


import Tkinter 
import sys 

def DrawOval(Event): 

# Event.widget will be the main canvas: 

Event.widget.create_oval(Event.x-5,Event.y-5, 

Event.x+5,Event.y+5) 
def DrawRectangle(Event): 

Event.widget.create_rectangle(Event.x-5,Event.y-5, 
Event.x+5,Event.y+5) 
def MoveButton(Si de): 

# The methods pack_forget() and grid_forget() unpack 

# a widget, but (unlike the destroyO method) 


Continued 
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Listing 20-1 (continued) 


# do not destroy it; it can be redisplayed later. 

QuitButton.pack_forget() 

QuitButton.pack(side=Side) 
root=Tkinter.Tk() 

MainCanvas=Tkinter.Canvas(root) 

Mai nCanvas.bind("<Button-l>",DrawOval ) 

Mai nCanvas.bind("<Button-3>",DrawRectangle) 

Mai nCanvas.pack(fi 11 =Tkinter.BOTH,expand=Tkinter.YES) 

Qu i tButton=Tk i nter.Button(MainCanvas,text = "Quit", 
cotnmand=sys . exi t) 

QuitButton.pack(side=Tkinter.BQTTQM) 
root.bind("<Up>",lambda e:MoveButton(Tkinter.TQP)) 
root.bind("<Down>",lambda e:MoveButton(Tkinter.BQTTQM)) 
root.bind("<Left>",lambda e:MoveButton(Tkinter.LEFT)) 
root.bind("<Right>",lambda e:MoveButton(Tkinter.RIGHT)) 
root.geometry("300x300") # Set minimum window size 
root.mainloop() 


C:\Pyt}ion20\Biblel9>uset*input .pv 

Ti*aceback <fiost recent call last>: 

File *'C:\Python20\Biblel9\UserInput-py”, line 69, in ? 
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self.t 
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Quit 1 
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-tk\Tkintei*.py”, line 931, in mainloop 


C:\Python2 
C:sPytbon2 
C:\Pytbon2 
C:\Pytbon2 
C:NPython20\Biblel9 >texteditor.py 
C:\Pytbon20\Biblel9 >texteditor.py 
C:\Pytbon20\Biblel9>canuasbounce.py 
C2\Pytbon20\Biblel9>cd .. 
C:\Pytbon20>cd bible20 
C:\Python20\Bible20>eMents.py 

J 


Figure 20-1: A canvas with custom mouse and keyboard event handiers 

















Chapter 20 ♦ Using Advanced Tkinter Widgets 375 


Advanced Widgets 

This section introduces three more widgets for your Tkinter widget toolbox: list- 
box, scale, and scrollbar. 

Listbox 

A listbox (Tkinter.Listbox) displays a list of options. Each option is a string, and 
each takes up one row in the listbox. Each item is assigned an index (starting from 0). 

The option selectmode governs what kind of selections the user can make. SINGLE 
allows one row to be selected at a time; MULTIPLE permits the user to select many 
rows at once. BROWSE (the default) is similar to SI NGLE, but allows the user to drag 
the mouse cursor across rows. EXTENDED is similar to MULTIPLE, but allows fancier 
selections to be made by Control- and Shift-clicking. 

The option height, which defaults to 10, specifies how many rows a listbox displays 
at once. If a listbox contains more rows than it can display at once, you should 
attach a scrollbar — see the section “Scrollbar” for details. 

Editing listbox contents 

To populate the listbox, call the method i nsertibefore,elementi, . . .]). This 
inserts one or more elements (which must be strings!) prior to index before. Use the 
special index Tkinter.ENDto append the new item(s) to the end of the listbox. 

The method delete(first[,last]) deletes all items from index/irsf to index last, 
inclusive. If last is not specified, the single item with index first is deleted. 

Checking listbox contents 

The method s i ze returns the number of items in the listbox. 

The method get(first[,last]) retrieves the items from index first to index last, 
inclusive. Normally, get returns a list of strings; if last is omitted, the single item 
with index first is returned. 

The method nearest(y) returns the index of the row closest to the specified 
y-coordinate. This is useful for determining what row a user is clicking. 


Checking and changing the selection 

The method curselection returns the current selection, in the form of a list of 
indices. If no row is selected, curselection returns an empty string. The method 
selecti on_i ncludes( index) returns true if the item with the specified index is 
selected. 
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The method selecti on_set (f i rst[,last]) selects the items from index first to 
index last, inclusive. The method selecti on_cl ear(first[,last]) deselects the 
specified items. 



When you specify a range of listbox indices, the list is inclusive, not exclusive. For 
example, MyList.selection_set(2,3) selects the items with index 2 and 3. 


Scale 


A scale widget (Tk inter.Scale) looks like a sliding knob. The user drags the 
slider to set a numeric value. You can attach a scale to a Tkinter variable (using the 
variable option), or use its get and set methods to access Its value directly. 

Range and precision 

The options from and to specify the numeric range available; the default is the 
range from 0 to 100. The option resolutiori is the smallest possible change the user 
can make in the numeric value. By default, resolutiori is 1 (so that the scale’s value 
is always an integer). 



Remember to use from_, not from, when passing the "from" option as a keyword 
argument. 

Widget size 

The option orient determines the direction in which the scale is laid out; valid val- 
ues are HORIZONTAL and VERTICAL. The option length specifies the length (in pix- 
els) of the scale; it defaults to 100. The option sliderlength determines the length of 
the sliding knob; it defaults to 30. 


Labeling 


By default, a scale displays the current numeric value above (or to the left of) the 
sliding scale. Set the showvalue option to false to disable this display. 

You can label the axis with several tick-marks. To do so, pass the distance between 
ticks in the option tickinterval. 


Scrollbar 


A scrollbar widget (Tki nter. Scrol 1 bar) is used in conjunction with another wid¬ 
get when that widget has more to show than it can display all at once. The scrollbar 
enables the user to scroll through the available Information. 


The orient option determines the scrollbar’s orientation; valid values are VERTICAL 
and HORIZONTAL. 
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To attach a vertical scrollbar to a Li stbox, Canvas, or Text widget, set the scroll- 
bar’s command option to the yview method of the widget. Then, set the widgefs 
yscrollcommand option to the scrollbar’s set method. (To attach a horizontal 
scrollbar, perform a simllar procedure, but use xview and xscrollcommand.') 

For example, the following two lines “hook together” a scrollbar (MyScrollbar) and 
a listbox (MyListbox): 

MyScrollbarC"command"]= MyListbox.yview 
MyListbox["yscrol1command"]= MyScrollbar.set 


Example: Color Scheme Customizer 

Tkinter allows you to use a predefined color scheme. These colors are used as 
defaults for the foreground and background options of widgets. The TopLevel 
method opti on_readf i 1 e( fi 1 ename ) reads in default colors and fonts from a file. 
You should call opti on_readf i 1 e as early in your program as possible, because it 
doesn’t affect any widgets already displayed onscreen. 

A typical line in the file has the form *W7dget*foreground: Coi or, where Widget 
is a widget class and Color is the default color for that sort of widget. The line 
*foreground: Coi or sets a default foreground for all other widgets. Similar lines 
set the default background colors. 

The example shown in Listing 20-2 lets you define a new color scheme. It uses a list¬ 
box, a scrollbar, and three sllding scales (for setting red, green, and blue levels). See 
Figure 20-2 for an example. 


Listing 20-2: ColorChooser.py 


import Tkinter 
import os 
import sys 

WIDGET_NAMES = ["Entry","Label","Menu","Text","Button","Listbox","Seal e 
"Scrol1bar", "Canvas"] 

OPTION_FILE_NAME="Tki nterColors.ini" 

C0L0R_C0MP0NENTS=["Red","Green"."Blue"] 

class CoiorChooser: 

def _init_(self): 

self.root = Tkinter.Tk() 

# Dictionary of options and values - corresponds to 
§ the option database (TkinterColors.ini): 
self.Options={ 1 


Continued 
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Listing 20-2 (continued) 


# Flag linked to the "Option set?" checkbox: 
self.OptionSetFlag=Tkinter.BooleanVar() 
self .GetOptionsFrottiFile() 
self.BuildWidgets() 
self.SelectedColorItem=None 
self.SelectNewColorltem(O) 
def SaveCurrentColorValues(self): 

"Use Scale-widget values to set internal color value" 
if (self.SelectedColorItem!=None): 
if (self.OptionSetFlag.get()): 

CoiorString="#" 

for CoiorComponent in COLOR_COMPONENTS: 

CoiorString+="%02X"%self.ColorValuesCColorComponent].get() 
self.Options[self.SelectedColoritem]=ColorString 
el se: 

# The User un-checked the "option set" box: 
if (self.Options.has_key(self.SelectedColoritem)): 
dei self.Options[self.SelectedColoritem] 
def UpdateControlsFromColorValue(self): 

"Use internal color value to update Scale widgets" 

if (self.SelectedColorItem!=None and self.OptionSetFlag.get()): 

Coi orString=self.Options.get(self.SelectedColoritem,"") 
if 1en(ColorString)!=7: 

CoiorString="#000000" # default 

el se: 

CoiorString="#000000" 

RedValue=int(ColorString[l:3].16) 
self.CoiorValues["Red"].set(RedVal ue) 

GreenValue=int(ColorString[3:5],16) 
self.ColorValues["Green"].set(GreenValue) 

B1ueValue=int(ColorString[5:],16) 
self.CoiorValues["B1ue"].set(B1ueValue) 
def OptionChecked(self): 

.Callback for clicking the "Option set" checkbox. 

if (self.OptionSetFlag.get()): 

self.EnableColorScales() 
el se: 

self.DisableColorScales() 
def EnableColorScales(self): 

for CoiorComponent in C0L0R_C0MP0NENTS: 

self.ColorSeal es[CoiorComponent]["state"]=Tkinter.NORMAE 
def DisableColorScales(self): 

for CoiorComponent in C0L0R_C0MP0NENTS: 

self.ColorScales[ColorComponent]["state"]=Tkinter.DISABLED 
def SelectNewColorItem(self.NewIndex): 

.Choose a new color item - save the current item, select the 

new entry in the listbox, and update the scale-widgets from the 
new entry. 
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self.SaveCurrentColorValues() 

self.SeiectedColorItem=self.ItemList.get(Newindex) 
self.ItemList.activate(NewIndex) 
self.ItemList.selection_set(Newindex) 
print "sel:self.SelectedColoritem 
print self.Options.has_key(self.SelectedColoritem) 
self.OptionSetFlag.set(self.Options.has_key(self.SelectedColoritem)) 
print self.OptionSetFlag.get() 
self .OptionCheckedI) 
self.UpdateControlsFromColorValue() 
def ListboxClicked(self,C1ickEvent): 

"Event handler for choosing a new Listbox entry" 

Newlndex=self.ItemList.nearest(ClickEvent.y) 
self.SelectNewColoritemiNewindex) 
def BuildWidgets(self): 

.'Set up all the application widgets. 

self.LeftPane=Tkinter.Framelself.root) 
self.RightPane=Tkinter.Framelself.root) 
self.ItemList=Tkinter.ListboxIself.LeftPane, 
selectmode=Tkinter.SINGLE) 

self.ItemList.pack(side=Tkinter.LEFT,expand=Tkinter.YES, 
fill=Tkinter.Y) 

self.ListBoxScrol1er=Tkinter.Scrollbarlself.LeftPane) 
self.ListBoxScrol1 er.packlside=Tkinter.RIGHT,expand=Tkinter.YES, 
fi 11 =Tki nter. Y) 

# Add entries to listbox: 

sel f. ItemLi st.insertdkinter.END, "*foreground") 
sel f. ItemLi st.insertdkinter.END, "*background") 
for WidgetName in WIDGET_NAMES: 

self.lt emList.insert(Tkinter.END,"*%s*foreground"%WidgetName) 
self.lt emList.insert(Tkinter.END,"*%s*background"%WidgetName) 

# Attach scrollbar to listbox: 

self.LiStBoxScrol1er["command"]=self.ItemList.yview 
self.ItemList["yscrol1 command"]=self.ListBoxScrol1 er.set 

# Handle listbox selection events specially: 

self.ItemList.bind("<Button-l>",self.ListboxClicked) 

# Add checkbox for setting and un-setting the option: 

CoiorSetCheck=Tkinter.Checkbutton(self.RightPane, 

text="Option set", vari abie=self.OptionSetElag, 
command=self.OptionChecked) 

Coi orSetCheck.pack(side=Tkinter.TOP,anchor=Tkinter.W) 

# Build red, green, and blue scales for setting colors: 
self.CoiorValues={} 

self.CoiorScales={) 

for CoiorComponent in COLOR_COMPONENTS: 

CoiorValue=Tkinter.IntVar() 
self.ColorValuesCColorComponent]=ColorValue 
NewScale=Tkinter.Scale(self.RightPane, 

orient=Tkinter.HORIZONTAL.from_=0,to=255, 
vari abie=ColorValue) 

self.ColorScalesLColorComponent]=NewScale 


Continued 
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Listing 20-2 (continued) 


Tkinter.Label(self.RightPane,text=ColorComponent).pack\ 
(side=Tkinter.TOP) 

NewScale.pack(side=Tkinter.TOP,pady=10) 

# Add "SAVE" and "QUIT" buttons: 

ButtonFrattie=Tki nter. Frattie(sel f.RightPane) 

ButtonFrame.pack() 

Tkinter.Button(ButtonFrame,text="Save", 

comtnand=sel f. SaveOpti onsToFi 1 e). packisi de=Tki nter. FEFT) 

Tkinter.Button(ButtonFrame,text="Quit", 

command=sys.exit).pack(side=Tkinter.FEFT) 

# Pack the parentmost widgets: 

self.FeftPane.pack(side=Tkinter.FEFT,expand=Tkinter.YES, 
fill=Tkinter.BQTH) 

self.RightPane.pack(side=Tkinter.RIGHT,expand=Tkinter.YES, 
fill=Tkinter.BQTH) 
def Run(self): 

self.root .mainloop() 
def SaveQptionsToFi1e(self): 

# Update internal coior-settings from scale-widgets: 
self.SaveCurrentColorValues() 

File=open(QPTIQN_FIFE_NAME."w") 

# Save *foreground and *background First: 
if self.Qptions.has_key("*foreground"): 

Fi 1 e.write("*foreground: %s\n"%self.Options["*foreground"]) 
dei self.Qptions["*foreground"] 
if self.Qptions.has_key("*background"): 

Fi 1 e.write("*background: %s\n"%self.Qptions["*background"]) 
dei self.Qptions["*background"] 
for Key in self.Options.keys(): 

Fi 1 e.write("%s: %s\n"%(Key,self.Qptions[Key])) 

Fi 1 e.close() 
print "Saved!" 

def GetQptionsFromFi1e(self): 

if os.path.exists(QPTIQN_FIFE_NAME): 

# Read the colors in: 

File=open(QPTIQN_FIFE_NAME,"r") 
for Line in Fi 1 e. readl i nes (): 

LineHalves=Line.split(":") 
if 1en(LineHalves)!=2: 

# Not a proper setting 
conti nue 

Value = LineHalves[l].strip() 

Index = LineHalves[0].strip() 
self .QptionsEIndex] = Value 
File.close() 

# Teli Tkinter to use these colors, too! 
self.root.option_readfi 1 e(QPTION_FILE_NAME) 

if (_name_=="_main_"): 

CoiorChooserI).Run() 
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Figure 20-2: Using scales and listboxes to design a color scheme 


Creating Dialogs 

Instead of using the Standard dialogs (as described in Chapter 19), you can create 
dialog boxes of your own. The module tkSimpl eDi al og provides aclass, Di al og, 
that you can subclass to create any dialog box. When you construet a Dialog 
instance, the dialog is (synchronously) displayed, and the user can click OK or 
Cancel. The constructor has the syntax Di al og (masterC, ti tl e]). 

Override the method body(tnaster) with a method that creates the widgets in the 
dialog body. If the body method returns a widget, that widget receives the initial 
focus when the dialog is displayed. Override the appl y method with a function to 
be called when the user clicks OK. 

In additlon, you can create custom buttons by overrlding the buttonbox method. 
The buttons should call the ok and cancel methods. In addition, binding <Return> 
to OK, and <Escape>to Cancel, is generally a good idea. 

The example in Listing 20-3 displays a simple dialog when the user presses a button. 

























382 Part IV > User Interfaces and Multimedia 


Listing 20-3: Complaintpy 


import Tkinter 
import tkSitnpl eDi al og 

class CotnplaintDialog(tkSitnpleDialog.Dialog): 
def body(self,Master): 

Tkinter.Label (sel f, 

text="Enter your complaint here:").pack() 
self.Cotnplaint=Tkinter.Entry(self) 
sel f. Cotnpl ai nt. pack() 

return sel f. Cotnpl ai nt # set initial focus here! 
def apply(self): 

sel f. Cotnpl ai ntStri ng=sel f. Cotnpl ai nt. get () 
def Cotnpl ai n (): 

# This next line doesn't return unti1 the user 

# clicks "Ok" or "Cancel": 

UserDi al og=Cotnpl aintDi al og( root," Enter your complaint") 
if hasattrdJserDialog,"ComplaintString"): 

# They must have clicked "Ok", since 

# applyC) got cal1ed. 

print "Complaint:",UserDialog.ComplaintString 
root=Tkinter.Tk() 

Tkinter.Button(root,text="I wish to register a complaint", 
command=Complain).pack() 

root.mainloop() 


Supporting Drag-and-Drop Operations 

The module Tkdnd provides simple drag-and-drop support for your Tkinter applica- 
tions. To implement drag-and-drop, you need to have suitable draggable objects, 
and suitable targets. A draggable object (which can be a widget) should implement 
a dnd_end method. A target can be any widget that implements the methods 

dnd_accept, dnd_moti on, dnd_enter, dnd_l eave, and dnd_commi t. 

To support drag-and-drop, bind a handler for<ButtonPress> in the widget from 
which you can drag. In the event handler, call Tkdnd. dnd_start (draggabl e, 
event), where draggable is a draggable object and event is the event you are 
handling. The call to dnd_start returns a drag-and-drop object. You can call this 
objecfs cancel method to cancel an in-progress drag; otherwise, you don’t use 
the drag-and-drop object. 
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As the user drags the object around, Tkdnd constantly looks for a new target widget. 
It checks the widget under the mouse cursor, then that parenfs widget, and so on. 
When it sees a widget with a dnd_accept method, it calls dnd_accept (draggabl e, 
event), where draggable is the object being dragged. If the call to dnd_accept 
returns anything but None, that widget becomes the new target. 

Whenever the dragged object moves, one of the following happens: 

If the old target and the new target are both None, nothing happens. 

♦ If the old and new targets are the same widget, its method dnd_moti on 
(draggabl e, event) is called. 

-f If the old target is None and the new target is not, its method 

dnd_enter (draggabl e ,event) is called. 

♦ If the new target is None and the old target is not, its method 

dnd_l eave(draggabl e , event) is called. 

-f If the old and new targets are not None and are different, dnd_l ea ve is called 
on the old one and then dnd_enter is called on the new one. 

If the draggable object is dropped on a valid target, dnd_cotntni t( draggabl e , event) 
is called on that target. If the draggable object is not dropped on a valid target, 
dnd_l eave is called on the previous target (if any). In either case, a call to 
dnd_end( target, event) is made on the draggable object when the user drops it. 

The program in Listing 20-4 illustrates drag-and-drop through the use of two custom 
listboxes. Entries can be dragged around within a listbox, or dragged between list- 
boxes. Figure 20-3 shows what the program looks like. 


Listing 20-4: DragAndDrop.py 


import Tkinter 
import Tkdnd 

class DraggableRow: 

def _i nit_(self,Index,ItemStr,Widget): 

self.Index=Index 
self.ItemStr=ItemStr 
self.Widget=Widget 
self.PreviousWidget=Widget 
def dnd_end(self,Target, Event): 
if Target==None: 

# Put the item back in its original widget! 

self.PreviousWidget.insert(Tkinter.END, 
self.ItemStr) 


Continued 
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Listing 20-4 (continued) 


class DragAndDropListbox(Tkinter.Listbox): 

def _i nit_(sel f,Master,cnf={},**kw): 

Tki nter.Listbox._i nit_(self,Master,cnf) 

self.bind("<ButtonPress>",self.StartDrag) 
def StartDrag(self,Event): 

Index=self.nearest(Event.y) 

ItemStr=self.get(Index) 

Tkdnd. dnd_start (DraggableRow(Index,IterriStr,self),Event) 
def dnd_accept(self,Item,Event): 
return self 

def dnd_leave(sel f,Item,Event): 
self.deiete(Item.Index) 

Item.PreviousWidget=sel f 
Item.Widget=None 
Item.Index=None 

def dnd_enter(self,Item,Event): 

if (Item.Widget==self and Item.Index!=None): 

self.deiete(Item.Index) 

Item.Widget=self 
Newlndex=self.nearest(Event.y) 

Newlndex=max(Newlndex, 0) 

self.insertCNewIndex,Item.ItemStr) 

Item.Index=NewIndex 
def dnd_commit(self,Item,Event): 
pass 

def dnd_motioniself,Item,Event): 
if (Item.Index!=None): 

self.deiete(Item.Index) 
NewIndex=self.nearest(Event.y) 

Newlndex=max(NewIndex,0) 

Item.Index=NewIndex 

self.insertCNewIndex,Item.ItemStr) 

root=Tkinter.Tk() 

LeftList=DragAndDropListbox(root) 

LeftList.pack(side=Tkinter.LEFT,fi 11=Tkinter.BOTH, 
expand=Tkinter.YES) 

RightList=DragAndDropLi stbox( root) 

RightList.packlside=Tkinter.RIGHT,fi 11=Tkinter.BOTH, 
expand=Tkinter.YES) 

# Add some elements to the listbox, for testing: 

for Name in ["Nene","Syvi a","Linna","Prisci 11 a"]: 

LeftList.insert(Tkinter.END,Name) 
root.mainloop() 
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Figure 20-3: Dragging and dropping elements between two listboxes 


Using Cursors 

The Standard widget option cursor specifies the name of a cursor image to use when 
the mouse is positioned over the widget. Setting cursor to an empty string uses the 
Standard System cursor. For example, the following code creates a Quit button, and 
changes the cursor to a skull-and-crossbones when it is positioned over the button: 

TI<inter.Button(root,text="Quit", comma nd=sys .exit, 
cursor="pirate").pack() 

Many cursors are available, which range from the useful to the silly. Table 20-3 
describes some useful cursors. 
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Table 20-3 

Cursors 

Name 

Description 

1eft_ptr 

Pointer arrow; a good default cursor 

watch 

Stopwatch; used to teli the user to wait while some 


operation finishes 

penci1 

Pendi; good for drawing 

xterm 

Insertion cursor; the default for Text and Entry widgets 

trek, gumby, box_spiral 

Some cute, silly cursors 


The TopLevel method after executes a function after a specified amount of 
time has passed. (See “Using Timers” in Chapter 19). The related method 
after_i dl e( functi on ) executes a specified function as soon as Tkinter 
empties its event queue and becomes idle. It is a handy way for restoring the 
cursor to normal after an operation has finished. 

The example in Listing 20-5 finds .mp3 files in the current directory and ali its 
subdirectories, and adds them to a playlist. It displays a busy cursor while it is 
searching the directories. (A fancier approach would be to spawn a child thread to 
do the search.) 


Listing 20-5: WaitCursor.py 


import Tk 
import os 
01dCursor 
def DoStu 

# Sav 

# (In 
OldCu 

# Cha 
root[ 

# Wai 

# thi 
root. 

# Tei 
root. 
F i 1 e= 
os. pa 
File. 


inter 


ffO: 

e the old cursor, so we can restore it later, 
this example, we know the old cursor is just ) 

rsor = root["cursor" ] 

nge the cursor: 

"cursor"]="watch" 


t for Tkinter to empty the 
s, in order to see the new 

update() 

1 Tkinter to RestoreCursor 

after_idle(RestoreCursor) 
open("P1ayList.m3u","w") 
th.walk(os.path.abspath(os 
close() 


event loop. We must do 
cursor: 

the next time ifs idle: 

curdir),CheckDir,File) 


def CheckDir(File,DirName,FileNames): 

# Write all the MP3 files in the directory to our playlist: 

for FileName in Fi 1eNames: 
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i f os.path.spl i text(Fi 1eName) [1] .upper( )==".MP3": 

Fi 1 e. wri te( os . path . join (Di rNatne , Fi 1 eName)+" \ n") 

def RestoreCursor(): 

root["cursor"]=01dCursor 

root=Tkinter.Tk() 

Tkinter.Button(text="Find files!",command=DoStuff).pack() 
root.mainloop() 


Designing New Widgets 

You can create new widgets by combining or subclassing existing ones. However, 
before you do, do a quick search Online — any widget you can imagine bas probably 
been created already! 

Listing 20-6 sbows a simple example — a progress bar, whicb keeps track of 
progress as a percentage from 0 to 100. Figure 20-4 sbows the program partway 
through its run. 


Listing 20-6: ProgressBar.py 


import Tkinter 
import time 
import sys 

class ProgressBar: 

def _init_(self, Parent, Height=10, Width=100, 

ForegroundColor=None,BackgroundColor=None,Progress=0): 
self.Height=Height 
self.Width=Width 

self.BarCanvas = Tkinter.Canvas(Parent, 
width=Width,height=Height, 
background=BackgroundColor,borderwidth=l, 
rei ief=Tkinter.SUNKEN) 
if (BackgroundColor): 

self.BarCanvas["backgroundcolor"]=BackgroundColor 
self.BarCanvas.pack(padx=5,pady=2) 
self.RectangleID=self.BarCanvas.create_rectangle(\ 
0,0,0,Height) 

if (ForegroundColor==None): 

ForegroundColor="black" 
self.BarCanvas.itemconfigure(\ 

self.RectangleID,fill=ForegroundColor) 
self.SetProgressPercent(Progress) 
def SetProgressPercent(self,NewLevel ): 


Continued 
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Listing 20-6 (continued) 


self.Progress=NewLevel 
self.Progress=rriin(100,self.Progress) 
self.Progress=rriax(0,self.Progress) 
self.DrawProgress() 
def DrawProgress(self): 

ProgressPixel=(self.Progress/100.0)*self.Width 
self.BarCanvas.coords(self.RectangleID, 
0,0,ProgressPixel,self.Height) 
def GetProgressPercent(sel f): 
return self.Progress 

# Simple demonstration: 

def IncrememtProgress(): 

01dLevel=Bar.GetProgressPercent!) 
if (01dLevel>99): sys.exit() 
Bar.SetProgressPercent(01dLevel+l) 
root.after(20,IncrememtProgress) 
root=Tkinter.Tk() 
root.title("Progress bar!") 

Bar=ProgressBar(root) 

root.after(20,1ncrememtProgress) 

root.mainloop() 



Figure 20-4: A custom widget for displaying a progress bar 
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Further Tkinter Adventures 

There are many more widgets, options, and tricks in Tkinter than are covered here. 
Following are some places to learn more. 

Additional widgets 

Python MegaWidgets (Pmw) is a large collection of Tkinter widgets. Examples 
include Notebook (a tabbed display) and Balloon (a class for adding popup help). 
Pmw is a nice way to develop fancier interfaces without becoming a Tk Jedi Master. 
Visit http: //www. dscpl . com. au/pmw/ to check it out. 

There are other collections of Tk widgets — such as Tix and BLT — that may help 
you save time developing a GUI. 

Learning more 

The Tkinter distribution is lacking in documentation, but there are several good 
Tkinter references out there: 

An Introduction to Tkinter, by Fredrik Lundh. Comprehensive, with many good 
examples. 

http: //WWW . py thonwa re. c otn/1 i brary/tkinter/introduction/ 
index.htm 

-f Python and Tkinter Programming, by John E. Grayson. Many interesting exam¬ 
ples. Covers Pmw in great detail. The book’s Web site is at 

http: //WWW .manning.com/Grayson/ 

-f The Tkinter topic guide — a good starting point for all things Tkinter. 

http: //WWW .python.org/topics/tkinter/doc. html 

The Tkinter Life Preserver, by Matt Conway. 

http: //WWW .python.org/doc/life- preserver/index.html 

When all else fails, read up on Tk. The correspondence between Tkinter and Tk is 
straightforward, so anything you learn about Tk will carry over to Tkinter too. 
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Summary 

Tkinter can handle sophisticated GUIs without much trouble. You can use the lay- 
out managers and event handler to get your program’s appearance and behavior 
just right. In this chapter, you: 

Handled various events. 

Created advanced widgets and dialogs. 

Used custom mouse cursors. 

In the next chapter, you learn all about the Curses module — a good user interface 
choice for terminais on which graphics (and hence Tkinter) aren’t available. 

■f -f 


Building User 
Interfaces with 
wxPython 

A lthough it is not Python’s official user interface library, 
wxPython is becoming an increasingly popular set of 
tools for building graphical user interfaces. Like Tkinter, it is 
powerful, easy to use, and works on several platforms. This 
cbapter gives you a jump start on using wxPython in your 
own applications. 



> ♦ ♦ ♦ 
In This Chapter 

Introducing wxPython 

Creating simple 
wxPython programs 

Choosing different 
window types 

Using wxPython 
Controls 


Introducing wxPython 

wxPython (http: //wxpython . org) is an extension module 
that wraps a C++ framework called wxWindows 
(http: //wxwi ndows . org). Both wxPython and wxWindows 
provide cross-platform support and are free for private as well 
as commercial use. This chapter focuses on the cross-plat¬ 
form GUI support provided by wxPytbon, but wxWindows also 
gives you cross-platform APls for multitbreading, database 
access, and so on. 

Tip Visit the wxPython Web site for straightforward download- 

ing and installing instructions, as well as the latest news 
and support. You can also join the wxPython community 
by subscribing to a free mailing list for questions, answers, 
and announcements. Visit http: //wxpros . com for infor- 
mation about professional support and training. 

The full feature set of wxPython deserves an entire book of its 
own, and a single chapter will all but scratch the surface. The 
purpose of this chapter, therefore, is to give you a high-level 
picture of what it supports, and to get you started on writing 
some wxPython programs of your own. You’ll stili want to 


Controlling layout 

Using built-in dialogs 

Drawing with device 
contexts 

Adding menus and 
keyboard shortcuts 

Accessing mouse and 
keyboard input 

Other wxPython 
features 

> ♦ ♦ ♦ 
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later sift through the documentation for additional options and features. Because 
wxPython is so easy to use, however, by the end of this chapter you’ll be able to 
write some very functional programs, and with very little effort. 

In addition to its built-in features, wxPython can also detect and use some popular 
Python extension modules such as Numerical Python (NumPy) and PyOpenGL, the 
OpenGL bindings for Python. 

See Chapter 32 for an introduction to NutnPy. 


wxPython often outperforms Tkinter, both with large amounts of data and overall 
responsiveness; it comes with a good set of high-level Controls and dialogs; and it 
does a pretty good job of giving applications a native look and feel (which isn’t nec- 
essarily a goal of Tkinter anyway). For these reasons, and because I find using 
wxPython very straightforward and intuitive, I personally prefer wxPython over 
Tkinter even though it doesnT ship as a Standard part of the Python distribution. 


Creating Simple wxPython Programs 

Most wxPython programs have a similar structure, so once you have that under 
your belt, you can quickly move on to programs that are more complex. Listing 21-1 
is a simple program that opens up a main window with a giant button in it. Clicking 
the button pops up a dialog box, as shown in Figure 21-1. 



Listing 21-1: wxclickme.py —A wxPython application 
with buttons 


frotn wxPython.wx import * 

class ButtonFrame(wxFratne): 

'Creates a frame with a single button in the center' 

def _i ni t_(sel f): 

wxFratne._init_(self, NULL, -1, 'wxPython', 

wxDefaultPosition, (200, 100)) 

button = wxButton(self, 111, 'Click Me!') 
EVT_BUTT0N(self, 111, self.onButton) 

def onButton(self, event): 

'Create a message dialog when the button is clicked' 

dlg = wxMessageDialog(self, 'Ow, quit it.', \ 

'Whine', wxOK) 

dlg.ShowModal() 
dlg.Destroy() 
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class App(wxApp): 

def OnInit(self): 

'Create the main window and insert the custom frame' 

fratre = ButtonFrameC) 
fratre . Show( true) 

return true # Yes, continue Processing 

# Create the app and start Processing messages 

app = App(O) 
app.MainLoop() 



\B 


Figure 21-1: The program in Listing 21-1 opens 
the dialog box on the button click event 


Ow, quit it. 


OK 


To understand this program, start at the end and work your way back. All wxPython 
programs instantiate a wxApp (or subclass) object and call its Mai nLoop method to 
start the message handling (Mai nLoop doesn’t return until the application window 
is closed). The wxApp subclass in the example, App, overloads the On I n i t method 
that is called during Inltiallzation. Oninit creates a custom frame, Button Fratre, 
makes it visible, and returns true (actually, wx .true) to signal success. These lines 
of code will be nearly identical for almost all your wxPython programs; for each 
new program, I usually cut and paste them from the previous program I wrote, 
changing only the name of the frame class to use. 

A frame is a top-level window like the main window in most applications (it usually 

has a title bar, is resizable, and so forth). The_ i n i t _method of the 

Button Fratre class calls the parent (wxFratre) constructor to set the title to 
“wxPython” and the size to 200 pixels wide and 100 tali. It adds a button with the 
label Click Me!, and telis wxPython to route button-click messages for that button to 
ButtonFratre’s onButton method. Notice howtrivial it is to set up event routing. 
The line 

EVT_BUTTON(self, 111, self.onButton ) 

telis wxPython to take all button-click events generated in the current window 
(sel f) with an ID of 111 (a random number 1 chose and assigned to the button) and 
send them to the onButton method. The only requirement for the onButton 
method is that it take an event argument. You can use a method such as onButton 
as the handler for many different events (if it makes sense to do so) because it 
receives as an argument the event to process. Each event is derived from the 
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wxEvent class and has methods that identify the event source, type, and so on. For 
example, if you registered onButton to handle events from several different but- 
tons, onButton could call the evenfs Get Id () method to determine which button 
was clicked. 

Tip Use the wxNewIdf ) function to generate unique ID numbers. 



The onButton method pops open a Standard message dialog, waits for you to click 
OK, and closes it. 

Fiddle around with the program until the basic structure makes sense and you’re 
comfortable with whafs happening. Conceptually, thafs the bulk of programming in 
wxPython — now you can just learn about other widgets besides buttons, and other 
events besides button-clicks. There’s plenty more to learn, of course, but the 
designers of wxPython have done an excellent job of insulating us from a lot of 
nasty detalls. 


Choosing Different Window Types 

The wxWi ndow class is the base class of all other Windows (everything from the 
main application window to a button or a text label is considered a window). Of the 
window types that can contain chlld Windows, there are two types: managed and 
nonmanaged. 

Tip Repeat ten times out loud: "A button is a window." Nearly everything is a descen- 

dent of wxWi ndow; therefore, for example, if the documentation telis you that you 
can call some method to add a child window to a parent, bear in mind that the 
child window can be a panei, a button, a scrollbar, and so on. 

Managed Windows 

A managed window is one that is directly controlled by the operating system’s win¬ 
dow manager. The first type is one youVe already seen, wxFrame, which often has a 
title bar, menus, and a status bar, and is usually resizable and movable by the user. 
wxMi ni Fratre is a wxFratre subclass that creates a tiny frame suitable for floating 
toolbars. 

A wxDi al og window is similar to a wxFratre window and is usually used to request 
input or display a message. When created with the wxDIALOG_MODAL style, the 
calling program can’t receive any user input until the dialog box is closed. 

Managed window constructors are generally like wxWi ndow( parent, id , ti tl e[, 
positi on] [, si zel [, styl e] ), where parent can be None for managed Windows, 
i d can be -1 for a default ID, and style is a bitwise OR combination of several 
class-specific flags: 
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Tip 


>>> frotn wxPython.wx import * 

>>> f = wxFratneCNone,-1, ' ' , size=(200,100), 
style=wxRESIZE_BORDER) 

>>> f.Center(); f.Show(l) # Later, use f.Show(O) to kill it 


Nonmanaged Windows 

Nonmanaged Windows are controlled by wxPython, and you use them by placing 
them inside other Windows. For example, the following creates a window with a 
resizable vertical split like the one shown in Figure 21-2: 

>>> f = wxFrame(None,-1,'SplitterWindow',si ze=(200,100)) 

>>> s = wxSplitterWindow(f,-1) 

>>> s.SplitVertically(wxWindow(s,-1),wxWindow(s,-1)) 

1 

>>> f.Show(l) 

1 



Figure 21-2: A user-resizable splitter window 


Notice that wxSpl i tterWi ndow’s Spl i tVerti cal 1 y method takes as parameters 
the two Windows it splits; for simplicity, I just created two plain Windows. A 
wxPanel window is like a dialog box in that you place Controls (buttons, text entry 
fields, and so on) in it, except that a panel lives inside another window such as a 
frame. The wxFIttTil Wi ndow class displays HTML files; you can even embed any 
wxPython widget within an HTML page and have it respond to events normally. 

Consuit demo. py in the wxPython distribution for information about embedding 
widgets in HTML pages. The demo also contains terrific examples of many other 
wxPython features. 

You can add scrolling to any window by first placing it inside a wxScrol 1 edWi ndow 
instance. Be sure to call its SetScrollBars method to initialize the size of the 
scrollbars. Some Windows, such as wxFItml Wi ndow, are derived from 
wxScrol 1 edWi ndow, or already have scrolling support to save you the trouble. 

The wxGri d class gives your application a spreadsheet-like table with rows and 
columns. It has plenty of Standard helpers for controlling user input or displaying 
data in certain ways, or you can implement your own grid cell renderers. 

The wxStatusBar and wxTool Bar classes enable you to add a status bar and a 
toolbar to any frame (call the frame’s SetStatusBar and SetToolBar methods, 
respectively). In the wxPython . 1 i b . f 1 oatbar module, you’ll find wxFl oatBar, a 
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wxTool Bar subclass implemented in Python that provides “dockable” toolbars that 
users can pull out of the frame and move elsewhere. 

Applications such as Microsoft Vlsual Studio enable you to open several files at a 
time, each in a separate child window that can’t leave the boundaries of a single 
parent window. wxPython enables you to create applications with this style of inter- 
face using the wxMDIChi 1 dFratne, wxMDICl i entWi ndow, and wxMDI Parent Frame 
classes. 

The program in Listing 21-2 creates a viewer for HTML files stored locally. Notice in 
Figure 21-3 that it uses a wxNotebook window to enable you to open several HTML 
files simultaneously, and the toolbar has buttons for adding and removing pages as 
well as quitting the application. 


Listing 21-2: grayui.py — A local HTML file viewer 


from wxPython.wx import * 
from wxPython.html import * 
from wxPython.1 ib.floatbar import * 
import time,os 

class BrowserFrame(wxFrame): 

'Creates a multi-pane viewer for local HTML files' 

ID_ADD = 5000 
ID_REM0VE = 5001 
ID_QUIT = 5002 

# Load support for viewing GIF files 

wxImage_AddHandl er(wxGIFHandl er ()) 

def _i ni t_(sel f): 

wxFrame._init_(self, NULL, -1, 'Grayul') 

# Create a toolbar with Add, Remove, and Quit buttons 

tb = wxFloatBarCsel f,-1) 

addWin = wxButton(tb,self.ID_ADD,'Add new window') 
removeWin = wxButton(tb,self.ID_REM0VE, 

'Remove current window') 
quit = wxButton(tb,self.ID_QUIT,'Qui t') 

# Tie button clicks to some event handlers 

EVT_BUTT0N(tb,self.ID_ADD,sel f.OnAdd) 

EVT_BUTT0N(tb,self.ID_REM0VE,self.OnRemove) 
EVT_BUTT0N(tb,self.ID_QUIT,self.OnQuit) 

# Add the buttons to the toolbar 

tb.AddControl(addWin) 
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tb. AddControl (retnoveWi n ) 
tb.AddSeparator() 
tb.AddControl(quit) 
tb.Realize() 

self.SetToolBar(tb) 
tb.SetFloatable(1) 

# Create a notebook to hold each window 

self.note = wxNotebook(self, -1) 

def GetFileNameCself): 

'Gets the name of an HTML file from the user' 
types = 'HTML f i 1 es | *. httnl ; *. htm' # Limit types to view 
dlg = wxFileDialog(self,style=wxOPEN|wxFILE_MUST_EXIST, 
wi1dcard=types) 

dlg . ShowModal() 

file = dlg.GetFi1ename() 

dlg. Destroy() 

return file 

def OnAdd(self,event): 

'Adds a new HTML window' 

file = self.GetFi1eName() 
if file: 

newWin = wxHtmlWindow(self.note1) 

self.note.AddPage(newWin,os.path.split(file)[l],l) 

newWin.LoadPage(fi 1 e) 

def OnRemove(self,event): 

'Removes the current HTML window' 

page = self.note.GetSelection() 
if page != -1: 

self.no te.Del etePage(page) 
self.note.AdvanceSelection() 

def OnQuit(self,event): 
self.Destroy() 

class App(wxApp): 

def OnInit(self): 

'Create the main window and insert the custom frame' 

frame = BrowserFrameC) 
frame.Show(true) 
return true 


# Create an app and go! 

app = App(O) 
app.MainLoop() 




398 Part IV > User Interfaces and Multimedia 



Figure 21-3: Build this simple viewer to display the documentation that 
ships with Python. 


This application uses an instance of the wxFl oatBar class (a wxTool bar child) to 
create a floating toolbar. (Try it out — click on the toolbar and drag it around tbe 
screen. Close it or move it back over its original location to dock it.) Although I just 
added some normal buttons, you can use the AddTool method to add icons like the 
ones you find on toolbars in many applications. 

Using the wxNotebook class is straightforward; for each tab, create a new window 
that is a child of the notebook, and add it with a call to AddPage or I nsertPage. 
Likewise, the wxHtml Wi ndow class is an easy way to display HTML pages. The 
BrowserFratne class definition contains a call to wxItnage_AddFlandl er so that it 
can view CompuServe GIF files. 

APyShellWindow enables users to access a Python interpreter running in interac- 
tive mode: 

from wxPython.wx import * 

frotn wxPython . 1 i b . pyshel 1 import PyShel 1 Wi ndow 

class App(wxApp): 

def OnInit(self): 

frame = wxFrame(None,-1,'MyPyShel1') 

PyShel1Window(frame,-1) 
frame.Show(true) 
return true 


app = App(O) 
app.MainLoop() 
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Using wxPython Controls 

wxPython ships with a comprehensive set of high-level Controls, or widgets. Most 
often, you place them in a wxPanel or wxDi al og, but they can be used elsewhere, 
such as in a status bar or toolbar. This section shows you what Controls are avall- 
able and how to use some of them; the process of controlling their layout is cov- 
ered in the Controlling Layout section. 

Common Controls 

Figure 21-4 shows most of the common Controls available to you in wxPython. 



Figure 21-4: Names and examples of the common wxPython Controls 


You can use wxButton and wxBi tmapButton to trigger an event; use the 
EVT_BUTTON (id, func ) function to link a button ID and an event handler. The 
FileBrowseButton button combines a button, a file dialog, and a text entry widget 
so that when clicked, the user browses for a file and the chosen file name ends up 
in the text entry field. Fi 1 eBrowseButtonWi thHi story’s text entry field has a 
drop-down list in which you can store previous choices. The wxGenButton class is 
a button class that is implemented by wxPython (and not natively) so that you can 
customize how the button behaves and how it looks when pressed. See the 
wxGenBitmapButton, wxGenToggleButton, and wxGenBitmapToggleButton for 
additional variations. 

Most Controls let you attach event handlers when the user modifies the controFs 
state. For example, by using the EVT_CHECKBOX (id, f unc) function, your handler 
function will be called anytime the checkbox is toggled. 
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Controls with a user-defined value (such as a text entry widget) usually have one or 
more Get methods to retrieve the user’s input. wxSl i der. GetVal ue (), for exam- 
ple, returns the current position of the slider. Controls that let users choose from a 
predefined set of values usually have methods such asGetSelection. 
wxChoice.GetSelectionf) returns the 0-based index of the currently selected 
string. Each Get method of a control usually has a corresponding Set method that 
you can use to programmatically set the controCs state. 

Tree Controls 

wxTreeCtrl is the Standard tree control in wxPython. Use the code in Listing 21-3 
to create the tree shown in Figure 21-5. 



Figure 21-5: wxPython's tree control 
showing the results of nested di r() calls 


Listing 21-3: treedemo.py — Sample using wxTreeCtrI 


from wxPython.wx import * 

class TreeFrame(wxFrame): 

def _i ni t_(sel f): 

wxFrame._init_(self, NULL, -1, 

'Tree Demo',size=(300,400)) 
# Make it a scrolled window so all data fits 
scroll = wxScrol1edWindowfsel f,-1) 
self.tree = wxTreeCtrl(scrol 1 ) 

EVT_SIZE(scrol1,self.OnSize) 
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# Populate a small tree 

parent = self.tree.AddRoot('dir()’) 
for i in dir(): 

child = sel f. tree .Appendi tetn( parent, i ) 
for j in dir(i): 

grandchild = self.tree.Appendltemlchi1d,j) 
def OnSize(self, event): 

# Make the tree control the size of the Client area 

self.tree.SetSize(self.GetClientSizeTupl e()) 

class App(wxApp): 

def OnInit(self): 

'Create the main window and insert the custom frame' 

fratne = TreeFratne() 
frame.Show(true) 
return true 

app = App(O) 
app.MainLoopC) 


Apart from the usual initialization work, there is code to populate the tree and to 
ensure that the tree control filis the entire Client area of the frame (using the 
EVT_SIZE event function). You create a root node with a call to AddRoot, and then 
add children with Appenditem calls. Refer to the documentation for Information 
about other features, includlng support for event notification, editing items, and 
using icons in the tree. 

wxPython . 1 i b. mvctree has the wxMVCTree class, which is a tree control that uses 
a model-view-control architecture in which code to display the Information is 
largely independent of the code to store the data. Such a model enables you to 
change one with little or no change to the other. 

Editor Controls 

The wxEdi tor and wxPyEdi tor classes (in wxPython . 1 i b . edi tor) are rudimen- 
tary text editor Controls (wxPy Edi tor is a wxEdi tor subclass that adds syntax 
highlighting). A more heavyweight and advanced edit control iswxStyledfextCtrl 
(in wxPython . stc). It enables you to mix different fonts and font attributes much 
like a word processor, and it has built-in syntax highlighting for a few languages, 
including Python. 


Controlling Layout 

When you put more than one control into a panel, dialog box, or other Container, 
you have to decide how you want to lay out, or organize, them. In some cases, you 
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can get by with specifying exact jc and y coordinates for the Controls. Other times, 
you need to correctly handle layout if there is a change in window size, default font 
(vision-impaired users often use a larger default font, for example), or platform (this 
is Python, after ali). wxPython gives you several mechanisms to control the layout. 

Tip lt's important to learn what layout options are available to you, but if you plan to 

^■ 0 ^ build a lot of user interfaces, consider acquiring a tool such as wxDesigner, Boa 
Constructor, or wxStudio to help you out. 

As you learn about the different types of layout mechanisms, don’t be fooled into 
thinking that you always have to choose one to the exclusion of another. You 
should use whatever works best for your particular situation, and that may mean 
mixing them together. You can’t combine them within the same Container (a panel, 
window, and so on), but you can have child containers use different methods. For 
example, your GUI could have two panels, one that uses sizers and one that uses 
layout constraints; and then you can lay them both out in the main window using 
hard-coded coordinates. 

Specifying coordinates 

The simplest way is occasionally the best. The constructor for every control takes 
two optional parameters, size and pos, that specify the controPs size and position, 
respectively: 

>>> frotn wxPython.wx import * 

>>> dlg = wxDialog(None,-1,'Hey',size = (200,200)) 

>>> dlg.Show(1) 

1 

»> dlg.SetSize((200,200)) 

>>> wxButton(dlg,-l,'Quit',pos=(10,100),size=(100,25)) 

Using size and pos, you can manually control the exact size and position of each 
control. It can be pretty tedious, however, so if this is the route you choose, build 
your GUI in an interactive Python session so that you can fine-tune it without hav- 
ing to re-run your program. 

Tip After youVe added a control to a Container, you can adjust its size and position by 

^ calling its SetSi ze and SetPosi ti on methods: 

myButton.SetSize((200,100)) # Both methods take a tuple 

wxWindows ships with a simple dialog editor (and documentation) that creates a 
WXR file describing the layout of your dialog box, and you can use WXR files in 
wxWindows or wxPython programs. For example, if you have a file called 
sample.wxr and it contains the definition for a dialog box named ‘myDialog’, you 
could open the dialog as follows: 


wxResourceParseFile('sample.wxr') 
dlg = wxDialog(parent, -1, '') 
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Tip 


dlg.LoadFromResourceCparent,'MyDialog' ) 
dlg.ShowModal() 


The call towxResourceParseFile needs to happen only once, so you could call it 
during initialization. 

If your dialog box looks great in wxPython's dialog editor, but looks compressed or 
otherwise messed up in your program, toggle the useDi al ogUnits flag in the 
dialog box's properties in the editor. 

The downside to using fixed coordinates is that, well, they’re fixed. A well-organized 
dialog box on one platform may look lousy on another, and if you have to change 
something later, you might end up doing a lot of extra work. One alternative is to 
create a different version of the resource file for each platform, and load the appro- 
priate one on startup. Despite these potential problems, precise widget layout 
sometimes requires less effort than wxPython’s other layout mechanisms, so you’ll 
have to judge for yourself. One approach that has helped me is to sketch out on 
paper the GUI 1 plan to build and then divide it up into small groups of Controls. 
Implement each group with a wxPanel that has its Controls laid out at speclfic 
coordinates, and then use sizers (see the next section) to add the different groups 
to the window. 

Sizers 

Sizers are objects that help control window layout by divlding a window into sub- 
windows that are laid out according to sizer rules. A sizer talks to all of its child 
objects to determine its own minimum size, which it reports to its parent. You can 
nest sizers inside other sizers to form an arbitrarily complex and deep nesting. The 
sizers you’ll use are children classes of wxSi zer, but if you want to create your own 
sizer type, you should derive it from wxPySi zer. 

Box sizers 

wxBoxSi zer and wxStati cBoxSi zer are the simplest forms of sizers, and the two 
are the same except that wxStati cBoxSi zer includes a wxStati cBox control 
around the outside of all of its children objects. A box sizer lays out Controls to 
form either a row or a column, which you choose when you create the sizer: 

sizer = wxBoxSizer(wxVERTICAL) # A sizer that creates a column 

box = wxStaticBoxfmyFrame, -1, 'Stuff') 

sizer = wxStaticBoxSizerfbox, wxFIORIZONTAL) # A row with border 

The direction you choose is called its primary orientation, so a wxBoxSi zer with 
wxVERTICAL has a vertical primary orientation. Once you have your sizer, you add 
objects to it using its Add or Prepend methods (Add puts the new object at the end 
of the group, Prepend at the beginning), which have the following forms: 
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Tip 


sizer.AdcKwindow, option, flag, border) # Add window or widget 
sizer.Add(sizer, option, flag, border) # Add a child sizer 
sizer.Add(w, h, option, flag, border) # Add a spacer 

When you add a window or control, keep in mind that when you create the control, 
it is stili a child of a window, nota sizer (so don’t try to use the sizer as the parent 
argument in the controPs constructor). You can pass to Add (or Prepend)a control 
or window, a child sizer (which may in turn contain other sizers), or the width and 
height of an invisible spacer object to pad between two items. 

When the sizer is laying out its items and has extra space along its primary orienta- 
tion, it looks at the option argument to determine how much extra space to give to 
each one. A value of 0 means that that item does not change size. If one item has an 
opti on value of 1 and another has a value of 2, the second item will get twice as 
much space as the first. 

The flag argument is a bitwise OR of several values that teli the sizer the border 
type to use around the item and what it should do with extra space along the 
opposite, or secondary, orientation. The border can be any combination of wxTOP, 
wxBOTTOM, wxLEFT, or wxRIGHT (wxALL puts them all together for you). For exam- 
ple, if you want a blank border around the top and left sides of your widget, you 
could use a fl ag of wxTOP | wxLEFT. 

If the f 1 ag value contains wxGROW (or wxEXPAND), the item will grow to fili the avail- 
able extra space. A value of wxSHAPED means that it will grow proportionally so that 
it always maintains the original aspect (width-to-height) ratio. Instead of growing, 
the item can remain aligned against a side (by using wxALIGN_LEFT, wxALIGN_ 
CENTER, wxALIGN_RIGHT, wxALIGN_TOP, or wxALIGN_B0TT0M). 

The border argument is the number of pixels of padding around the item, and it 
makes sense only if the flag argument specifies one or more borders (such as 

wxTOP). 

The sizers also have an AddMany method that you can use to combine multiple 
Add calls. 

Call the parent window’s SetSizer(sizer) method to teli it to use your new sizer. 
When the window’s Lay out () method is called, the window will lay out its contents 
with help from the sizer. An alternative is to call the window’s SetAutoLayout(l) 
method so that it automatically calls Lay out anytime the window size changes. 

The si zer . Fi t (wi ndow) method resizes the parent window to the minimum 
acceptable size of its contents. If you then call sizer.SetSizeHints(window), the 
sizer will remember the current size as the minimum and prevent the user from 
ever making the window smaller than that minimum. 

Before all of this seeps out of your brain, try the following code so you can see a 
wxBoxSi zer in action: 
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>>> frotn wxPython.wx import * 

>>> f = wxFratneCNone,-1, ' Sizer Test') 

>>> f.Show(l) 

>>> sizer = wxBoxSizer(wxVERTICAL) 

>>> sizer.Add(wxButton(f,-l,'One'),1,wxALL|wxALIGN_LEFT,3) 

>>> sizer.Add(wxButton(f,-l, 'Two' ),2,wxALIGN_RIGFIT) 

>>> sizer . Add(wxButton(f,-l,'Three'),2,wxALL|wxALIGN_CENTER,3) 

>>> sizer . Add(10,10,2,wxALL,3) 

>>> sizer.Add(wxButton(f,-1,'Four'),4,wxALL|wxGR0W,3) 

>>> sizer.Add(wxButton(f,-1,'Fi ve' ) ,4,wxALL,3) 

>>> f.SetAutoLayout(1) 

>>> f.SetSizer(sizer) 

>>> sizer.Fit(f) 

>>> si zer . SetSi zeFli nts (f) 

Resize the window in each direction, and once you’re done playing, use f. Show( 0) to 
make the window go away. As shown in Figure 21-6, vertically (the primary orienta- 
tion) the buttons grow according to the opti on value used (for example, button Five 
is four times as tali as button One). Most of the buttons have a three-pixel border on 
all sides, and their horizontal alignment, or stretching, follows the f 1 ag values. 



Figure 21-6: Buttons resize and align according to 
the rules of the box sizer. 


A good exercise for you to try now would be to replace one of the buttons with a 
horizontal wxBoxSi zer that also contains buttons of its own. This forms a row of 
buttons that are treated as a single unit by the parent sizer, but are laid out individ- 
ually by the child sizer. This will help you see how you can use a hierarchy of 
nested sizers to achieve a complex layout. 

Grid sizers 

wxGri dSi zer lays out objects in a table. The width of each column is the width of 
the widest item in the grid; and the height of each row is that of the tallest item. You 
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create this sizer by calling wxGridSizer([rows|cols]=n), where n is the number 
of rows or columns you want. You choose eitber rows or coi s to limit the number 
of rows or columns, and wxGri dSi zer figures out the correct value for the other 
dimension. For example, if you set a limit of two rows and then added seven but- 
tons to the sizer, the first row would have the first four buttons, and then second 
row would have the last three buttons. 

wxFl exGri dSi zer is like wxGri dSi zer except that instead of having uniform 
column and row sizes, each column is the width of the widest item in that column 
only; and the height of each row is that of the tallest item in that row only. 


Layout constraints 

Layout constraints define the size and position of an item in terms of its siblings or 
parents. Each item has eight constraints that you can define: four for the edges 
(left, right, top, and bottom), two for the size (width and height), and two for its 
center (pc, y). For example, you might constrain a button by specifying that its 
height should be left unchanged, its left edge should be aligned with that of some 
other button, its width should be half that of the parent panel, and its center y 
coordinate should match that of some other widgefs top: 

wc = wxLayoutConstraints() 

wc.height.AsIs() # "Don't change it" 

wc. 1 eft. SatneAs (some Button , wxLeft) 

wc.width.PercentOf(parentPanel , 50) 

wc. centerY.SameAs(someOtherWi dget, wxTop) 

myButton.SetConstraints(wc) 


You usually have to specify four of the eight constraints in order for the widget to 
be fully constrained. Once it is fully constrained, the layout algorithm can deduce 
the remalnlng constraints on its own. 

The constraint names are 1 eft, ri ght, top, bottom, wi dth, hei ght, centerX, and 
centerY. You can call the following methods for each constraint: 


Above(win[, margin]), 
Below(win[, margin]), 
LeftOf(win[, margin]), 
RightOf(win[,margini) 

Absolute(value) 


Asls() 

Unconstrained() 


Sets the constraint to be above, below, to the 
left of, or to the right of the window wi n, with 
an optional margin 


Sets the constraint to this value. For example, 
wc. left .Absol ute( 10 ) gives the left edge an 
X coordinate of 10. 

Does not change the constrainfs current value 
Returns this constraint to its default state 
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PercentOf Makes the current constraint a percentage of 

(wi n , edge , percent ) the given edge of the given window 


SameAs Makes the current constraint the same as the 

(win, edge[, margin]) given edge of the given window 


As with sizers, you can call a window’s Lay out () method to perform the layout, or 
you can call SetAutoLayoutC 1 ) so that Layout is called each time the parent win¬ 
dow is resized. 


Layout algorithms 

For MDl or SDI applications, you can use the wxLayoutAl gori thm class to lay out 
subwindows. Study the wxLayoutAl gori thm and wxSash LayoutWi ndow documen- 
tation for more Information. 


Using Built-in Dialogs 

One of wxPython’s strengths is its rich set of built-in dialogs that you can use to get 
user input. In general, the way you use each dialog follows this pattern (the exam- 
ple here uses a dialog that has the user choose a directory name): 


dlg = wxDirDialog(None) 

if dlg.ShowModal() == wxID_0K: 

path = dlg.GetPath() 
dlg.Destroy() 


# Create it 

# Check the return code 

# Read user's input 
Destroy it 


The dialog’s ShowModal method usually returns wxl D_0K or wxID_CANCEL, and 
each dialog has its own set of methods you use to retrieve the user’s input. 

Table 21-1 describes some of wxPython’s useful built-in dialogs. 


Table 21-1 

Useful wxPython Dialogs 

Class 

Use the Dialog To 

wxDirDialog 

Browse for a directory name 

wxFi1eDialog 

Browse for a file name 

wxFontDialog 

Choose a font, point size, color, and so on 

wxColourDialog 

Choose a color 

wxPrintDialog 

Select a printer 


Continued 
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Table 21-1 (continued) 

Class 

Use the Dialog To 

wxPageSetupDial og 

Modify page orientation and margins 

wxProgressDial og 

Display a moving progress meter 

wxMessageDialog 

Display a simple message 

wxScrolledMessageDialog 

Display a longer message in a scrollable window 

wxSingleChoiceDialog 

Choose an item from a list 

wxMultipleChoiceDialog 

Choose one or more items from a list 

wxT extEntryDialog 

Enter a line of text 

wxBusylnfo 

Notify the user that the program is temporarily busy 


The wxBusylnfo dialog is unique in that the dialog appears as soon as you create 
it, and it disappears when the object goes out of scope: 

def rol1backChanges(sel f): 

wxBusyInfo('Reverting to previous state...’) 

# Do some work, dialog destroyed automagically when done 

Drawing with Device Contexts 

Like some other GUI frameworks, wxPython uses device contexts as an abstraction 
for displaying information on some output device. Ali device contexts are descen- 
dents of the wxDC class, so code that outputs to a device context automatically 
Works whether the output device is the screen, a printer, or just a file. Table 21-2 
lists some common device context classes. 


Table 21-2 

wxPython Device Context Classes 

Class 

Outputs To 

wxWindowDC 

An entire window, including titie bars and borders 

wxClientDC 

Window Client area outside of the OnPai nt method 

wxPaintDC 

Window Client area during a call to OnPai nt 

wxPrinterDC 

A Microsoft Windows printer 

wxPostScriptDC 

A PostScript file or printer 

wxMemoryDC 

A bitmap 

wxMetaFi1eDC 

A Microsoft Windows metafile 










Chapter21 -4- Building User Interfaces with wxPython 409 


Tip 


Device contexts give you a large number of methods to perform all sorts of actions, 
including clipping; writing text; converting between different units; and drawing 
graphics primitives, including lines, ares, and splines. 

To ensure that your programs work on Microsoft Windows, before drawing, call the 
device context's BeginDrawi ng( ) method; and call its EndDrawi ng () method 
when you're done. 

The device context uses the current pen to draw lines and outlines; pens (wxPen) 
have attributes such as line thickness and color. Text color is not affected by pen 
color. To fili in regions, it uses the current brush (wxBrus h), which can have both a 
color and a pattern that it uses when filling. 

The program in Listing 21-6 shows you how to use a device context to paint on the 
screen, and generates output as shown in Figure 21-7. 


Listing 21-6: wxcanvas.py - An example of 
drawing with device contexts 


frotn wxPython.wx import * 
import whrandotn 

class CanvasFrame(wxFrame): 

# A list of stock brushes we can use instead of 

# creating our own 

brushes = [wxBLACK_BRUSH,wxBLUE_BRUSH, 
wxCYAN_BRUSH,wxGREEN_BRUSH, 
wxGREY_BRUSH,wxRED_BRUSH,wxWHITE_BRUSH] 

def _i ni t_(sel f): 

wxFratne ._i ni t_(self,None,-l, 

'CanvasFratne' ,si ze=(550,350)) 
self.SetBackgroundColour (wxNamedColor("WHITE")) 

# Capture the paint message 

EVT_PAINT(self, self.OnPaint) 

def OnPaint(self, event): 
dc = wxPaintDC(self) 
dc.BeginDrawing() 

# Draw a grid of randomly colored boxes 

for y in range(15): 

for X in range(lO): 

dc.SetBrush(whrandotn.choice(self.brushes)) 

dc.DrawRectangle(x*20,y*20,20,20) 

# Draw a random polygon over the boxes 

# (Outline is in blue, but fili color is that 


Continued 
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Listing21-6 (continued) 


# of the last box it drew) 

dc. SetPen (wxPen (wxNatnedCol our ('BLUE'))) 
pts = [] 

for i in range(20): 

pts.append((whrandotn.randint(0,200), 
whrandotn.randint(0,300))) 
dc.DrawPolygon(pts) 

# Draw some rotated text 

font = wxFont(20, wxNORMAL, wxNORMAL, wxNORMAL) 

font. SetFaceNatneC ' Jokertnan LET') 

dc.SetFont(font) 

for a in range(0, 360, 20): 

c = a * 0.71 # 360/255, fit angle into color range 

dc.SetTextForeground(wxCol our (c,c,c)) 

dc.DrawRotatedText(" wxPython", 350, 150, a) 

dc.EndDrawing() 

cl ass App(wxApp): 

def OnInit(self): 

'Create the main window and insert the custom frame' 

fratne = CanvasFratneC) 
frame.Show(true) 
return true 

app = App(O) 
app.MainLoop() 



Figure 21-7: Using device contexis to draw graphics 
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The_ i ni t _function calls EVT_PAI NT so that the OnPai nt method will be called 

each time the screen needs to be redrawn. Notice that OnPai nt creates a 
wxPai ntDC for drawing, and that it begins and ends with calls to Begi nDrawi ng 
and EndDrawing. 


Adding Menus and Keyboard Shortcuts 

Your wxPython application can have popup menus or groups of menus on a menu 
bar at the top of a frame. Individual menu items can be disabled or grayed out, and 
each can have an associated line of help text. 

A menu consists of one or more menu items, each of which has a unique numerical 
identifler. Create a menu by calling wxMenuf [ti tl e] ), and add items with its 
AppencKid, n a me ) method: 

menu = wxMenu() 
menu.AppendClO, 'Load') 
menu.Append(11, 'Save') 
menu.Append(12, 'Quit') 

The menu title is displayed as part of the menu’s contents. Create a menu bar by 
calling wxMenuBarf ). Attach a menu to a menu bar by calling the menu bar’s 
Appendfmenu, title) method: 

mb = wxMenuBarf) 

mb.Append(menu, 'File') 

Finally, call a frame’s SetMenuBar(bar) method to attach the menu bar to the 
frame: 

frame.SetMenuBar(mb) 


Tip By creating menu items separately as wxMenuItems, you can create more power- 

fui menu items, such as menu items with bitmaps. 

Accelerators are keyboard shortcuts for commands users would normally have to 
generate with the mouse (clicking a menu item, for example). By calling a window’s 
Set Ac celerato rTable(table) method, you can assign a group of shortcuts to 
that window. You create an accelerator table by calling the 

wxAccel eratorTabl e( 1 i st ) constructor, where 1 i st is a list of accelerator entry 
tuples of the form (flag, code, command).flagisa bitwise-OR combination of 
keypress modlfiers such as wxACCEL_ALT and wxACCEL_SHI FT, and code is the 
ASCII code of the keypress or one of wxPython’s many speciai key variables, such 
as WXF_10 (for the FIO key) or WXK_END (the End key). command is the menu item 
identifler. Eor example: 
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accel = [(wxACCEL_CTRL,WXK_ESCAPE,10), 

(wxACCEL_NORMAL,WXK_ESCAPE,ll), 

(wxACCEL_CTRL|wxACCEL_SHIFT,WXK_Fl,12)] 
frame.SetAcceleratorTable(wxAcceleratorTable(accel )) 

enables Ctrl-Esc, Esc, and Ctrl-Shift-Fl as accelerators for menu commands 10 
through 12. 


Accessing Mouse and Keyboard Input 

Most input events are handled by wxPython directly. When a user clicks a button, 
for example, the window automatically processes the clicking and releasing of the 
mouse button. ff necessary, however, you can intercept and handle this lower-level 
input. 

When you call EVT_CHAR(wi n , f unc ), future keystrokes (“normal” keys, but not 
modifiers such as Ctrl or Shift) directed to wi n will cause wxKey Events to be sent 
to fune. Use EVT_CHAR_H00K to cateh modifier keypresses, and EVT_KEY_UP and 
EVT_KEY_DOWN to be notified when keys are pressed or released. 

Tip Oniy one window has keyboard focus at any time, so your window will receive 

^ keystroke notifications onIy if it has the focus. Use the window's Set Focus () 

' method to acquire keyboard focus. 

If you want oniy to intercept some input but let wxPython handle the rest, your han- 
dler function can pass the input on to the window’s normal handler. For example, if 
you want a keypress to be interpreted uslng the normal behavior, your handler 
should call the window’s OnChar method. 

For catehing mouse button click events, use EVT_LEFT_DOWN, EVT_LEFT_UP, and 
EVT_DCLICK to capture mouse left button presses, releases, and double-clicks, respec- 
tively. There are corresponding functions for the middle and right buttons as well. 

EVT_M0TI0N causes each mouse movement to be reported, and use 
EVT_ENTER_WINDOW and EVT_LEAVE_WINDOW to be notified when the window has 
mouse focus. If you want to process all mouse events, just use EVT_MOUSE_EVENTS 
to capture them all. 


Other wxPython Features 

As mentioned before, wxPython has far more features than can adequately be cov- 
ered in one chapter. This final section is here to pique your interest enough to do 
some investigating on your own, and to ensure that you don’t invest a lot of time 
implementing something that wxPython already has. 
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Clipboard, drag and drop, and cursors 

You can create and change mouse cursors and tool tips with the wxCursor and 
wxTool Ti p classes and their children. 

The wxCl i pboard, wxDataFormat, and wxDataObject class hierarchies implement 
support for transferring data to and from the clipboard and converting it between 
different formats. The wxDraglmage class is useful for implementing your own 
visual representation of dragging a file or other object in your application. See the 
wxDropSource and wxDropTarget classes too. 

By calling a window’s SetCursor(cursor) method, the mouse cursor will change 
to the given cursor any time it enters the window. You can create your own cursor 
or use one of the built-in cursors: 

myFrame.SetCursor(wxStockCursor(wxCURSOR_BULLSEYE)) 

Graphics 

The Object Graphics Library (OGL) is a library for creating and easily manipulating 
flowcharts and other graphs. See the wxShapeCan vas, wxShape, and wxDi agratn 
classes for more information. 

wxBi tmap, wxitnage, and wxicon all deal with loading and displaying images in dif¬ 
ferent ways. For each file type you use, you must load awxImageFlandler instance 
that handles decoding the image data (wxPython comes with several, such as 
wxJPEGFIandl er and wxGI FFIandl er). See also the wxMask and wxPal ette classes. 

If you have installed the PyOpenGL extension module, you can use wxGLCanvas to 
include an OpenGL window in your application. 

Date and time 

wxPython has powerful date and time support (covering dates even hundreds of 
millions of years in the future). wxDateTi tne represents a specific point in time, 
whereas wxDateSpan and wxTi tneSpan represent intervals. 

The wxCal endarCtrl is a control that looks like a wall calendar and is useful for 
both displaying and inputting dates. 

Fonts 

wxFont objects hold information about fonts, and wxFontData objects hold infor¬ 
mation about the dialogs users use to choose fonts and set font properties. 
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HTML 

The wxPython . httnl module contains classes for parsing, printing, and displaying 
HTML pages in a window or a device context. 

Printing 

wxPn' ntDi al og and wxPageSetupDi al og wrap two dialogs used for configuring 
the printer and page in preparation for printing, and wxPr inter and wxPrintout 
take care of the actual printing. There are also the wxPrintPreview and 
wxPrevi ewFrames classes for supporting print preview. 

Other 

Finally, if you’re using Windows and want to use COM, you can dynamically create a 
wxWindows-like class to embed any ActiveX control in your application by using 

wxPython.1 ib.activexwrapper.MakeActi veXCl ass. 


Summary 

wxPython is a powerful library for creating cross-platform GUI applications. It has a 
full set of simple and high-level Controls, including built-in support for trees and 
tables; and it is very easy to use. In this chapter you: 

Learned the basic structure of most wxPython applications. 

Created powerful and functional GUl-based applications in very few lines of 
code. 

-f Used wxPython’s built-in dialogs for browsing for files, choosing colors, and 
so on. 

-f Reviewed the different types of Windows, Controls, and features that 
wxPython provides. 

The next chapter shows you how to use the curses (not the spoken kind) library to 
create text-based user interfaces. 


Using Curses 



C urses is a library for handling a text-based display termi- 
nal. It is widely used on UNIX. It can handie text Windows, 
colors, and keyboard input. Moreover, it saves you the trouble 
of learning the controi codes for every kind of terminal. 


A Curses OverView 

In ancient days of yore, there was not a computer in every 
office. People used terminais like the VTIOO to connect to a 
Central system. These terminais displayed a grid on which 
each square contained a text character. Sending controi codes 
to the terminal couid change the color, move the cursor, and 
so on. However, the magical controi codes varied between 
Systems. Therefore, a program that produced cute output on a 
Tektronix 4105 terminal might have produced bizarre Symbol 
salad on a VT230. 

The curses library was born as a portable tool for text display. 
It has been eclipsed by ncurses, which adds some features. 

The Python module curses is a thin wrapper for the ncurses 
API. The various functions in the curses API have some over- 
lap — for example, the window methods addch, addstr, and 
addnstr all print text. For purposes of brevity, this chapter 
omits many redundant items. 

Curses provides a class, WindowObject, for display. You can 
use one or more Windows, resize them, move them, and so 
forth. 

^Note In curses, the top-left comer square of the screen has coor- 

dinates (0,0). Screen coordinates in curses are given with 
vertical position first —(y, x). This is the opposite of the 
usual ordering, so be carefui not to get your coordinates 
reversed! 

Listing 22-1 provides a simple curses program. Run it to get 
some quick gratification (and to make sure that curses is 
installed on your system!) 


> ♦ ♦ ♦ 
In This Chapter 

A Curses overview 

Starting up and 
shutting down 

Displaying and 
erasing text 

Moving the cursor 

Getting user input 

Managing Windows 

Editing text 

Using color 

Example: o simple 
moze gome 

♦ ♦ ♦ ♦ 
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Listing 22-1 : CurseWorId.py 


import curses 
try: 

MainWindow=curses.i nitscr() # initialize curses 
Mai nWi ndow. addstr (" Hei 1 0 , datnn it!") 
MainWindow.refresh() 

MainWindow.getch() # Read a keystroke 
fi nally: 

curses.endwin() # de-initialize curses 


Starting Up and Shutting Down 

The function i ni tscr initializes curses and returns a Window object representing 
the whole screen. The function endwi n de-initializes curses. The function i sendwi n 
returns true if endwi n has been called. 

The module function wrapper(tnai nfunc ,*args ) handles typical setup and 
shutdown for you. Calllng wrapper sets up curses, creates a window, and calls 
rriainfunc(window,*args).lt also restores the terminal to normal when your main 
function completes, even if it terminates abnormally. This is important, because a 
curses program that doesnT call endwi n may leave the shell in a hlghly weird state! 
For reference, wrapper does (and later undoes) the following things: 

Creates a window (curses . i ni tscr()) 

♦ Turns off echo (curses . noecho()) 

Turns off keyboard buffering (curses.cbreakC)) 

-f Activates color, where available (curses . start_col or()) 

Caution The functions fi 1 ter and use_env, which must be called before i ni tscr, do 
not Work (as of Python 2.0 and 2.1). 


Displaying and Erasing Text 

The window method addstrC [y ,x, ]text[, attri butes] ) prints the string textdX 
screen position (y, x) — by default, at the current cursor position. You can specify 
attrlbutes to control the appearance of the text. Attributes can be combined by bit- 
wise-OR. See Table 22-1 for a list of available text attributes: 
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Table 22-1 

Text Attributes 

Attribute 

Meaning 

A_BLINK 

Blinking text 

A_BOLD 

Bold text 

A_DIM 

Dim text 

A_NORMAL 

Ordinary text 

A_STANDOUT 

Highlighted text 

A^UNDERLINE 

Underlined text 


For example, the following code prints a bold, blinking “Howdy!” at column 50 of 
row 5: 

MainWindow.addstr(5,50,"Howdy!", curses.A_BLINK | curses.A_B0LD) 


Inserting 

addstr overwrites any text that was already on the window. To insert text, call 
insstr([y,x,]str[,attributes]). Any characters on the line are moved to the 
right; characters moved off the right edge of the screen are lost. A call to i n se rt 1 n 
inserts a blank row under the cursor; all following rows move down by one. 

Default attributes 

The method attrset(attributes) sets the default attributes for all subsequent 
calls to addstr. The methods attron (attri bute ) and attrof f (attri bute ) tog- 
gle one default attribute. 

Reading from the window (screen-scraping) 

The method i n c h ( y , x ) returns the character at the given window position. 
Actually, it returns the character as a number in the lower eight bits, and the 
attributes in the upper twenty-four bits. Therefore, the following code would check 
for a bold X at row 3, column 10: 

Character= MainWindow.inch(3,10) 

Letter = chr(Character & OxFF) 

Attributes = Character & (~0xFF) 

return ((Attributes & curses.A_B0LD) and (Letter=="X")) 

The method instr([y,x,]n) returns a string of n characters, extracted from the 
specified screen position. It ignores attribute information. 
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Erasing 

The method erase clears the window. cl rtoeol erases from the current cursor 
position to the end of the line; cl rtobot also clears all lines below the cursor. The 
method delch([y,x]) erases a single character (by default, the one under the cur¬ 
sor)— characters to its right move left by one square. dei etel n deletes the line 
under the cursor — any following lines move up by one row. 

Refreshing 

After changing the contents of a window, call its ref resh method to repaint the 
actual screen. If you get tired of calling r ef r e s h, call i tnme do k ( f 1 a g) to set the 
“immediate refresh” flag — if the flag is set, the window will be repainted after every 
change. However, note that this can resuit in reduced speed and/or flickering. 

If you are using several Windows at once, the most efficient way to repaint is to call 
the noutref resh method of a window (instead of refresh), and then call the 
doupdate function. 

You can flag a window as “dirty” to ensure that it will be redrawn at the next refresh 
call. The methods touchwi n and untouchwi n mark the entire window as dirty or 
clean, respectively. touchline(y,count) marks count lines as dirty, starting with 
liney. The methods i s_l inetouched(y) and i s_wi ntouched return true if the 
specified line, or the window itself, is dirty. 

Boxes and lines 

The method border draws a border around the window’s edges. The border is 
made up of individual characters. If you like, you can specify the characters to dis- 
play, by passingthem (as integers) toborder(W,E,N,S,NW,NE,SW,SE). Here,5is 
the character to use for the bottom edge, NE is the character to use for the top- 
right corner, and so forth. Pass 0 as a character to use the default. 

You can draw an arbitrary box by calling 

curses.textpad.rectangleCwindow,Top,Left,Bottom,Right). The box uses 
line-drawing characters where available. Otherwise, it will fall back to Standard 
ASClI-art pluses, pipes, and dashes. 

The window background 

Windows have a background. The method bkgdsetf character[, attri butes]) 
changes the window’s background. When the window (or a portion of it) is erased, 
it is painted with character, with the specified attributes. Furthermore, the specified 
attributes are combined with any nonblank characters drawn on the window. The 
similar method bkgd( character [.attributes]) immediately paints blank 
squares of the window with character. 
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Example: masking a box 

Listing 22-2 illustrates a simple Mask class, for temporarily covering a part of the 
screen. A mask can cover a rectangular block of a window with a call to cover, and 
then restore the original text with a call to reveal. 


Listing 22-2: Mask.py 


import curses 
class Mask: 

def _i ni t_(self,Window,Top,Bottotn,Left,Right): 

self.Window=Window 
self.Top=Top 
self.Bottorri=Bottorri 
self.Left=Left 
self.Right=Right 
self.01dText=None 

def Cover(self,Character="X",Attributes=curses.A_DIM): 

# Cover the current screen contents. Store 

# them in 01dText[RowIndex][Columnindex] for later: 

self.01dText=[] 

for Row in range(self.Top,self.Bottom+1): 
self.01dfext.append([ ]) 

for Coi in rangeCself. Left, self.Right+1): 
self.01dText[-1].append(\ 

self.Window.inch(Row,Coi )) 
self.Window.addstr(Row,Coi, 

Character,Attributes) 
def Reveal(self): 

if (self.01dText==None): return 

for Row in range(self.Top,self.Bottom+l): 

CurrentLine=self.01dText[Row-sel f.Top] 
for Coi in range(self.Left, self.Right+1): 
CurrentCol=(Col-self.Left) 

Character=chr(CurrentLinefCurrentCol] & OxFF) 
Attributes=CurrentLineCCurrentCol] & (~0xFF) 
self.Window.addstr(Row,Col, 

Character,Attributes) 

def MainCMainWindow): 

MainWindow.addstr(10,10,"Yes it is!") 

MainWindow.addstr(11,10,"No it isn't!",curses.A_B0LD) 

MainWindow.addstr(12,10,"Yes it is!",curses.A_UNDERLINE) 
Mai nWi ndow.addstr(13,10No it isn't!",curses.A_STAND0UT) 
MainWindow.addstr(14,10,"YES IT ISI",curses.A_B0LD) 
MyMask=Mask(MainWindow,10,20,10,40) 

MainWindow.refresh() 

MainWindow.getch() 


Continued 
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Listing 22-2 (continued) 


MyMask.CoverC) 
MainWindow.refreshC) 
MainWindow.getch() 
MyMask.Reveal() 
MainWindow.refresh() 
MainWindow.getch() 

if (_natne_=="_main_"): 

curses.wrapper(Main) 


Moving the Cursor 

The function getsyx returns the cursor’s screen position in the form of a tuple (y, 
x). The function setsyxCy, x) moves the cursor to the specified position. 

The window methods getyx and tnove(y,x) checkand set the cursor position 
within a window. If the window filis the screen (as the window returned by a call to 
i ni tscr does), window positioning is the same as screen positioning. 

The window method getparyx returns the window’s coordinates relative to its 
parent window. These coordinates are the location (in the parent) of the window’s 
top-left corner. If the window has no parent, getparyx returns (-1, -1). Note that 
cursor position is tracked independently by every window. 

The window method getmaxyx returns the size of the window in a tuple of the form 
(height, width). Note that getmaxyx() [0] is nota validy-coordinate, as row num- 
bering is 0-based; the last row of the screen has y-coordinate getmaxyx()[0]-l. 
The same is true for x-coordinates. 

The window method leaveok (flag) toggles the “Leave-the-cursor-where-it-is- 
after-repainting-the screen” flag. Calling leaveok(l) is a good idea if a blinking 
cursor won’t convey useful information to the user. If the flag is set, getsyx returns 
(-1, -1); calling setsyxf 1,1) sets the flag to true. 

The function curs^set (vi si bi 1 i ty ) sets the cursor visibility to 0 (invisible); 1 
(visible — often an underline); or 2 (very visible — often a block). The return value 
of curs_set is the old visibility level. 

Listing 22-3 paints a spiral pattern on the window, using cursor positioning. 
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Listing 22-3: Spiral.py 


import curses 
import math 

def DrawSpiral(Window,CenterY,CenterX,Height,Width): 

Seal ingFactor = l.0 
Angie=0 

HalfHeight = float(Height)/2 
HalfWidth = f1oat(Width)/2 
while (Seal ingFactor>0): 

Y = CenterY + 

(Hal fHei ght*tnath .sin(Angle)*ScalingFactor) 

X = CenterX + (Hal fWi dth*tnath . cos (Angi e )*Scal i ngFactor) 
Window.move(int(Y),int(X)) 

Window.addstr() 

Angie+=0.05 

Seal ingFactor=ScalingFactor - 0.001 
Window.refresh() 

def Main(Window): 

(Height,Width)=Window.getmaxyx() 

Height-=1 # Don't make the spiral too big 
Width-=1 

CenterY=Height/2 
CenterX=Width/2 

DrawSp i ral(Window,CenterY,CenterX,Height,Width) 

Window.geteh() 

if _name_=="_main_ 

curses.wrapper(Main) 


Cetting User Input 

Curses starts out in cooked mode — the user’s keyboard input is buffered and pro- 
cessed one line at a time. In raw mode, buffering is turned off, and keys are pro- 
cessed as they are pressed. Call the functlons raw and noraw to toggle between 
modes. 

In addition, you can call c b r e a k and nocbreakto switch c b r e a k mode (also known 
as “rare” mode) on and off. The difference between ebreak and raw is that special 
characters (such as suspend) lose their normal effects in raw mode. The four 
modes (raw, noraw, ebreak, and noebreak) are mutually exclusive. 

The window method keypad(flag) toggles keypad mode for a window. If keypad 
mode is not set, special character codes are not interpreted by curses. This means 
that special keystrokes such as functlon keys will put several special characters 
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into the keyboard buffer, extended keystrokes will not be available, and mouse 
events will not be available. In general, you want keypad mode on! 

Call echo and noecho to toggle the automatlc echoing of user input to the screen. 
By default, echoing is on; curses.wrapper turns echoing off and switches to 
cbreak mode. 

Reading keys 

The window method getch reads a character and returns it as an integer. For an 
ASCII character, the value returned is the character’s ASCII value (as returned by 
ord); other characters (such as function keys) may return non-ASCll values. The 
method getkey reads a character, returning it as a string. 

Both getch and getkey are normally synchronous; they wait until the user presses 
a key. The method nodelay(flag) makes them synchronous if flag is true. In 
synchronous mode, if no keypress is available, the methods return getch and 
getkey which return -1 and “-1”, respectively. 

The method getstr reads a string from the user, handling things such as backspac- 
ing in the process. Note that getstr doesn’t play well with nodel ay or noecho. In 
fact, getstr is quite primitive; see “Editing Text” for a more pleasant way to extract 
input from your users. 

Other keyboard-related functions 

You can throw a character onto the keyboard buffer by calling the function 
ungetch( character). The next call to getch will return character. You can only 
“un-get” one character at a time. 

A call to the function fl ushi np clears out the input buffers, throwing away any 
pending input that you havenT processed yet. 

Fancy characters 

When keypad mode is active, control characters are interpreted for you by curses. 
Most of these characters have corresponding constants. For example, the following 
code fragment checks whether the user pressed F5: 

Char=Window.getch() 
if Char==curses.KEY_F5: 

# do stuff! 

Arrow keys (where available) are represented by KEY_UP, KEY_LEFT, KEY_R1GHT, 
and KEY_DOWN. See the curses documentation for a complete list of these con¬ 
stants. 
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In addition, the module curses.ascii provides constants and functions for cleanly 
handling ASCII characters. For example, curses.ascii.SP is equal to 32 (the ASCII 
value for a space); curses.ascii.BEL is 7 (the bell-character; Ctrl-G on most Systems). 

Reading mouse input 

In order to detect mouse events, you must call the function mousemaskCmask), 
where mask represents the mouse events you want to see. The return value has the 
form (available.old). Here, available is a mask of the events that will be 
reported (hopefully, the same as mask), and old is the old event mask. For example, 
the following code tries to watch for clicks and double-clicks of button 1 (the left 
button): 

(avai 1 abi e,ol d) = curses.mousemaskfcurses.BUTT0N1_PRESSED | 

curses.BUTT0N2_PRESSED) 
if (available & curses.BUTT0N1_PRESSED): 

CanSeeClick=l 
el se: 

CanSeeClick=0 

You also need to turn keypad mode on; otherwise, mouse events are not visible. 

Mouse events are first signaled by a value of KEY_MOUSE passed to getch. At this 
point, you can examine the mouse input with a call to the function getmouse. The 
return value is a tuple of the form (id, x, y, z, state). Here, x and y are the coordi- 
nates of the mouse click, state is the event type, and id and z can be safely ignored. 

Table 22-2 describes ali the available mouse events. A particular event (or event 
mask) may be a bitwise-OR or several of them. The pound sign (#) represents a 
number from 1 to 4. 


Table 22-2 

Mouse Events 

Name 

Meaning 

BUTTON#_PRESSED 

Button # was pressed 

BUTTON#_RELEASED 

Button # was released 

BUTTON# CLICKED 

Button # was clicked 

BUTTON# D0UBLE_CL1CKED 

Button # was double-clicked 

BUTTON# TRIPLE CLICKED 

Button # was triple-clicked 

BUTTON SHIFT 

Button was Shift-clicked 

BUTTON CTRL 

Button was Control-clicked 

BUTTON ALT 

Button was Alt-clicked 
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The function ungettnouse(id,x,y,z,state), similar to ungetch, pushes a mouse 
event back onto the buffer. 


Example: yes, no, or maybe 

The program shown in Listing 22-4 provides three options, and lets the user choose 
one either by clicking it or by pressing a key. 


Listing 22-4: Deathray.py 


import curses 
import curses.textpad 
import whrandom 

class CursesButton: 

def _i nit_(self,Window,Y,X,Label ,Hotkey=0): 

self.Y=Y 
self.X=X 
self.Label=Label 

sel f.Width=len(Label)+2 # label, plus lines on side 
self.Underline=Underline 

# Draw the button: 

curses.textpad.rectangle(Window,Y,X,Y+2,X+self.Width) 

# Draw the button label: 

Window.addstr(Y+l,X+1.Label.curses.A_B0LD) 

# Make the hotkey stand out: 

Window.addstr(Y-(-l. X+Underl i ne-i-1 .Label [Underline] 

.curses.A_REVERSE) 

Window.refresh() 
def KeyPressed(sel f.Char): 

if (Char>255): return 0 # skip control-characters 
if chr(Char).upper()==self.Label[self.Underline]: 

return 1 
el se: 

return 0 

def MouseClicked(self.MouseEvent): 
(id.x.y.z.event)=MouseEvent 
if (self.Y <= y <= self.Y+2) and \ 

(self.X <= X < sel f. X-t-sel f. Wi dth ): 
return 1 
el se: 

return 0 

def ShowDialog(Window): 

curses.mousemask(curses.BUTTON1_PRESSED) 

Window.addstr(5.0."Real ly. REALLY fire death ray?") 

YesButton=CursesButton(Window.8.10."Yes") 

NoButton=CursesButton(Window.8.20."No") 

MaybeButton=CursesButton(Window.8.30."Maybe") 
Buttons=[YesButton.NoButton.MaybeButton] 

Window.nodelay(1) 
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Action="" 
w h i 1 e 1: 

Key=Window.getch() 
if (Key==-1): 
conti nue 

for Button in Buttons: 

if Button.KeyPressedCKey): 

Action=Button.Label 

# Handle mouseevents: 

if (Key==curses.KEY_MOUSE): 

MouseEvent=curses.getmousei) 
for Button in Buttons: 

if Button.MouseClickediMouseEvent): 
Action=Button.Label 
if Action!="": break 
# Handle the actions 
if (Action=="Yes"): 

FireDeathRayCWindow) 
if (Action=="No"): 
pass 

if (Acti on=="Maybe" and whrandotn. randotn() > 0.5): 
FireDeathRay(Window) 

def FireDeathRay(Window): 

Window.clear() 

# Kra-ppoowwww! Frrrraapppp!! 

Window.bkgd("X") 

Window.nodelay(0) 

Window.getch() 

if _name_=="_tnain_": 

curses.wrapper(ShowD i alog) 


Managing Windows 

You can create a new, parentless window by calling the function 
newwi n([lines,colurrins,]y,x). The new window’s top-left corner will be at 
(y, x). It will have height lines and width columns — by default, it will stretch to the 
bottom-right edge of the screen. Similarly, you can create a subwindow within an 
existing window by calling the method subwin([lines,colunins,]y,x). 

The method tn vwi n ( y, x ) moves a window so that its upper-left corner is at (y, x). 

Pads 

A pad is similar to a window, except that it can be larger than the screen. It is a con¬ 
venient way to make more data available than you can show all at once. It supports 
all the methods of a window, but has a different ref res h method. 
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The function newpacK rows , coi utnns ) creates a pad of the given size. To draw the 
pad’s contents, call refresh(screenY,screenX,padTop,padLeft,padBottotn, 
padRightl.A region within the pad will be displayed, with its top-left corner at 
(screenY,screenX). The pad contents displayed lie in the rectangle with corners 
(padTop,padLeff) and (padBottom,padRighf). 

Stacking Windows 

The module curses.panel allows you to cleanly “stack” Windows on top of each 
other so that only the visible portion of each window is displayed. The function 
new_panel (Wi ndow) returns a panel that wraps the specified window. You can 
change the panehs stacking position by calling its methods bottom and top. You 
can hide and reveal panels by calling hi de and show. After changing patterns, call 
the function update_panel s to update the Virtual screen, then curses . doupdate 
to repaint the screen. The function bottotn_panel returns the bottom-most panel, 
and top_panel returns the topmost panel. 


Editing Text 

The module curses . textpad provides a class, Textbox, for convenient text edit¬ 
ing. The constructor takes one argument: the window in which to place the 
Textbox. 

Once you have a Textbox, you can call e d i t ([ v a 1 i d a t o r ]) to let the user enter 
data, and call gather to retrieve the Textbox’s contents (as a string). The user can 
type text, scroll around the Textbox, and finish input by pressing Ctrl-G (or Enter, if 
the window has only one line). Because gather returns the entire window’s con¬ 
tents, you generally want to create a special window for use by only your Textbox. 

Table 22-3 describes the commands available within a Textbox. 


Table 22-3 

Textbox Commands 

Keystroke 

Aetion 

Ctrl-A 

Go to left edge of window 

Ctrl-B 

Cursor left, wrapping to previous line if appropriate 

Ctrl-D 

Delete character under cursor 

Ctrl-E 

Co to right edge (stripspaces off) or end of line (stripspaces on) 

Ctrl-F 

Cursor right, wrapping to next line when appropriate 
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Keystroke 

Aetion 

Ctrl-G 

Terminate, returning the window contents 

Ctrl-H 

Delete character backward 

Ctrl-J 

Terminate if the window is one line; otherwise, insert newline 

Ctrl-K 

If line is blank, delete it; otherwise, ciear to end of line 

Ctrl-L 

Refresh screen 

CtrI-N 

Cursor down; move down one line 

Ctrl-0 

Insert a blank line at cursor location 

Ctrl-P 

Cursor up; move up one line 


You can, optionally, pass a callback function toedit([validator]). This function 
is called whenever the user presses a key, and the keystroke is passed as a parame- 
ter. The return value of validator, if any, is passed along to the Textbox. For instance, 
use the following if you want Esc to finish input in your Textbox: 

def Validator(Ch): 

if Ch==curses.asci i.ESC: 

return curses.asci i.BEL 
el se: 

return Ch 


Using Color 

The function has_col ors returns true if the terminal can display colors. The method 
start_col or initializes color display: it should be called immediately after i ni tscr. 

Numbering 

Colors come in two forms: color numbers and color pairs. Color numbers range 
from 0 to COLORS; they identify a color in the curses palette. Color pairs are valid 
attributes to pass to Wi ndow. addstr; they identify a foreground color number and 
a background color number. Therefore, each color pair is basically a pair of color 
numbers. 

Just to make things more interesting, color pairs are also numbered. Try not to 
confuse pair numbers with color numbers. (Go on, 1 dare you — try! Actually, the 
whole System starts to make sense after a while.) 
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The function coi or_pai r( nutnber) returns the color pair corresponding to the 
given pair number; the opposite function, pair_nutTiber(pair), returns the pair 
number of a color pair. 

Setting colors 

Color pair 0 is always white on black. You can change the colors of the other pairs by 
calling i ni t_pai r (pai r_nutTiber, foreground, background). Here background and 
foreground are color numbers. The function pai r_content (pai r_nutTiber) returns 
the pair’s current colors as a tuple of the form (foreground,background). 

The constants COLOR.BLACK, COLOR.RED, COLOR.GREEN, COLOR_YELLOW, 
COLOR.BLUE, COLOR_MAGENTA, COLOR.CYAN and COLOR.WHITE are available 
to denote the corresponding color numbers. For example, the following code draws 
a simple German flag: 

# In the next line, 1 is the number of a 

# color-pair, while curses.WHITE is a 

# coiornumber: 

curses.ini t_pair(l,curses.WHITE,curses.BLACK) 
curses.ini t_pair(2,curses.WHITE,curses.RED) 
curses.init_pair(3,curses.WHITE,curses.YELLOW) 

Window.addstr(0,0," "*10,curses.coior_pair(1)) 

Window.addstr(l,0," "*10,curses.coior_pair(2)) 

Window.addstr(2,0," "*10,curses.coior_pair(3)) 

Tweaking the colors 

Defining colors is not possible on most terminals. The function can_change_col or 
returns true on those terminals where it is. A call toinit_color(number, red, 
green , bl ue) redefines color number to have the specifled intensities of red, 
green, and blue. Intensity ranges from 0 to 1,000. The function 
coi or_content (number) returns the current definition of color number as a tuple 
of the form (red, green, blue). 


Example: A Simple Maze Came 

1 have a soft spot in my heart for curses because 1 have spent more time than 1 care 
to admit playing ASCll-based games such as Angband and Nethack. The program 
shown in Listing 22-5 is far slmpler, but it does use several curses features. It uses a 
pad to hold a large maze, whlch the user can move around in. 
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Listing 22-5: Maze.py 


import curses 
itnport curses.ascii 
import whrandom 

# Possible contents of maze-squares: 

MAZE_WALL="X" 

MAZE_ENTRANCE="*" 

MAZE_HALLWAY="." 

# Attributes for displaying maze squares: 

MAZE_ATTRIBUTE={MAZE_WALL:curses.A_N0RMAL. 

MAZE_ENTRANCE:curses.A_B0LD, 

MAZE_HALLWAY:curses.A_DIM.1 

# Simple class representing a compass directiori: 

class Directior: 

def _i nit_(self,Name,XDelta,YDelta): 

self.Name=Name 
self.XDelta=XDelta 
self. YDelta=YDelta 
self. Marker=Name[0] 
def SetOpposite(self,Dir): 
self.Opposite=Dir 
Dir.Opposite=self 
N0RTH=Direction("North",0,-1) 

S0UTH=Direction( "South", 0,1) 

EAST=Direction("East",1,0) 

WEST=Direction("West",-1,0) 

NORTH.SetOpposite(SOUTH) 

EAST.SetOpposite(WEST) 

VALID_DIRECTI0NS=[N0RTH,SOUTH.EAST.WEST] 

# Maze creation uses directior "markers" to indicate how we got 

# to a square. so that we can (later) backtrack: 

MARKED_DIRECTI0NS={NORTH.Marker:NORTH.SOUTH.Marker:SOUTH. 

EAST.Marker:EAST.WEST.Marker:WEST) 

# Map keystrokes to compass di rections: 

KEY_DIRECTI0NS={curses.KEY_UP:NORTH.curses.KEY_D0WN:SOUTH. 

curses.KEY_LEFT:WEST.curses.KEY_RIGHT:EAST) 

class Maze: 

def _init_(self.Size=ll): 

# Maze size must be an odd number: 

if (Size%2==0): 

Size+=l 

self.Size=Size 

self.Pad=curses.newpad(self.Size+1.self.Size+1) 
self.Fi 11WithWal1s() 
def Fi 11WithWal1s(self): 

for Y in range(0,self.Size): 

self.Pad.addstr(Y.O.MAZE_WALL*self.Size.MAZE_ATTRIBUTE[MAZE_WALL]) 
def Set(self.X.Y.Char): 

self.Pad.addstr(Y.X.Char.MAZE_ATTRIBUTE.get(Char.curses.A_N0RMAL)) 


Continued 
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Listing 22-5 (continued) 


def Get(self,X,Y): 

return self.Pad.instr(Y,X,l) 
def Bui1dRandomMaze(sel f): 
sel f.Fi 11WithWal1s() 

CurrentX=l 

CurrentY=l 

self.Set(CurrentX,CurrentY,MAZE_ENTRANCE) 
whi1 e (1): 

Di rection=self.GetValidDirection(CurrentX,CurrentY) 
if (Di rection!=None): 

# Take one step forward 

self.Set(CurrentX+Direction.XDelta, 

CurrentY+Direction.YDelta,MAZE_HALLWAY) 

CurrentX+=Direction.XDelta*2 
CurrentY+=Direction.YDelta*2 
self.Set(CurrentX,CurrentY,Di rection.Marker) 
el se: 

# Backtrack one step 

BackDirectionMarker=self.Get(CurrentX,CurrentY) 

BackDirection=MARKED_DIRECTIONS[BackDirectionMarker].Opposite 
CurrentX+=BackDirection.XDelta*2 
CurrentY+=BackDirection.YDelta*2 

# If we backtracked to the entrance. the maze is done! 

if self.Get(CurrentX,CurrentY)==MAZE_ENTRANCE: 
break 

# Fix up the maze: 

for X in range(0,self.Size): 

for Y in range(0,self.Size): 

if self.Get(X,Y) not in [MAZE_HALLWAY,MAZE_WALL, MAZE_ENTRANCE]: 
self.Set(X,Y,MAZE_HALLWAY) 
def GetVal idDi recti on( sel f, X , Y): 

Directi onIndex=whrandotn. randi nt (0,1 en( VALID_DIRECTIONS) -1) 

FirstIndex=DirectionIndex 
whi1 e (1): 

Direction=VALID_DIRECTIONS[DirectionIndex] 

NextSquare=(X+Direction.XDelta*2,Y+Direction.YDelta*2) 
if ((0 < NextSquareFO] < self.Size) and 
(0 < NextSquare[l] < self.Size) and 
sel f.Get(NextSquareFOl,NextSquare[l])==MAZE_WALL): 
return Di rection 
Di rectionIndex+=l 

i f (DirectionIndex>=len(VALID_DIRECTIONS)): 

Di rectionlndex=0 

if (DirectionIndex==FirstIndex): 
return None 

def ShowSelf(self,ScreenLeft,ScreenTop,P1ayerX,P1ayerY.Radius): 

Top=PlayerY-Radius 
Bottom=PlayerY+Radius 
Left=PlayerX-Radius 




Chapter 22 ■¥ Using Curses 431 


Right=PlayerX+Radius 
ScreenRight=ScreenLeft+Radius*2+l 
ScreenBottom=ScreenTop+Radius*2+l 
if (Top<0): 

ScreenTop -= Top 
Top=0 

if (Left<0): 

ScreenLeft -= Left 
Left=0 

if (Right>self .Size-1): 

ScreenRight-=(self.Size-l-Right) 

Right=self.Size-1 
if (Bottotti>sel f. Si ze-1): 

ScreenBottom-=(self.Size-l-Bottom) 

Bottotti=self .Size-1 

self.Pad.refreshdop, Left, ScreenTop, ScreenLeft, ScreenBottom, ScreenRi ght) 

def Main(Window): 

# Set up coiors: 

curses.ini t_pair(1.curses.C0L0R_GREEN.curses.C0L0R_BLACK) 
curses.ini t_pair(2.curses.C0L0R_BLUE,curses.C0L0R_BLACK) 
curses.init_pair(3.curses.C0L0R_RED,curses.C0L0R_BLACK) 
MAZE_ATTRIBUTE[MAZE_HALLWAY] |= curses.coior_pair(1) 
MAZE_ATTRIBUTE[MAZE_ENTRANCE] |= curses.coior_pair(2) 
MAZE_ATTRIBUTE[MAZE_WALL] |= curses.coior_pair(3) 
curses.curs_set(0) # invisible cursor 
MyMaze=Maze(20) 

MyMaze. Bui 1 dRandoitiMaze() 

P1ayerX=19 
P1ayerY=19 
LightRadius=3 

MazeWi ndow=curses . newwi n( 10,10, lOa-Li ghtRadi us*2-M , lOa-Li ghtRadi us*2-rl) 
whi1 e 1: 

MazeWindow.erase() 

MyMaze. ShowSel fdO.lO.PlayerX.Pl ayerY , Li ghtRadi us ) 

Wi ndow. addch( lOa-Li ghtRadi us.lOa-Light Radius, , 
curses.color_pair(2) & curses.A_STAND0UT) 

Window.refresh() 

Key=Window.getch() 

if (Key==ord('q ') or Key==curses.asci i . ESC): 
break 

Direction=KEY_DIRECTIONS.get(Key.None) 
if (Di rection): 

TargetSquare=MyMaze.Get(P1ayerX+Direction.XDelta, 

P1 ayerYa-Di recti on .YDel ta) 
if TargetSquare==MAZE_ENTRANCE: 

MazeFinished(Window) 
break 

if TargetSquare==MAZE_HALLWAY: 

PlayerX += Di rection.XDelta 
PlayerY += Di rection.YDelta 


Continued 
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Listing 22-5 (continued) 

def MazeFinished(Window): 

Window.clear() 

Window.addstr(5,5."C0NGRATULATI0N!" 
Window.addstr(6,5."A WINNER IS YOU! 
Window.getch() 
pass 

.curses.coior_pair(2)) 

".curses.coior_pair(3)) 

if (_name_=="_main_"): 

curses.wrapper(Main) 
print "Bye!" 



Summary 

The curses library is an easy, portable way to create a text-mode user interface. In 
this chapter, you used curses to: 

-f Display and read text onscreen. 

Handle mouse and keyboard input. 

Use Textboxes for easy input. 

Draw colorful text. 

The next chapter demonstrates various ways to create a command interpreter in 
Python, including the spiffy graphics language Lepto. 

> > -f 






Building Simple 

Command 

Interpreters 

W hen someone says “user interface,” I usually think of 
a GUI with nice buttons and menus, but sometimes a 
more appropriate and powerful interface uses a custom mini- 
language in wbich your users write small programs or Scripts. 
This chapter introduces Python’s support for such a user 
interface and walks you through the process of creating a 
graphical plotting application that is driven by a small, cus¬ 
tom scripting language called Lepto. 


Beginning with the End in Mind 

The Python libraries covered in this chapter are the s hl ex 
and cmd modules. The nature of these two modules makes it 
difficult to cover each feature in isolation, so each section of 
this chapter builds a portion of a single application. Once 
youVe seen the modules in that larger context, rereading the 
explanations of the modules’ features will make more sense. 

The application that you will build is a simple plotter (sort of 
like the turtle graphics you find in languages like LOGO). It is 
controlled by user-provided Scripts, and the scripting lan¬ 
guage provides basic movement commands and support for 
creating subroutines (procedures). 

If you imagine a spectrum on which you position program- 
ming languages according to their power and flexibility, the 
high end would contain Python, and the low end would con- 
tain this chapter’s language. Because one of the world’s 
largest snakes is a type of Python (around 10 meters long), I 
named this new language Lepto, short for Leptotyphlopidae, a 
type of blind snake that ranks as one of the world’s smallest, 
around 13 centimeters. 



> ♦ ♦ ♦ 

In This Chapter 

Beginning with the 
end in mind 

Understanding the 
Lepto language 

Creating a Lepto 
Lexical analyzer 

Adding interactive- 
mode features 

Executing Lepto 
commands 

♦ ♦ ♦ ♦ 
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The following is a sample Lepto program, and Figure 23-1 shows the resuit of run- 
ning it through the finished application from this chapter: 


Listing 23-1 : leptogui.py - A sample Lepto program 


C:\terrip>leptogui .py 
Welcome to Lepto! 

Enter a comtnand or type 'help' 

: color blue 
: scale 30 

: sub kochedge # A subroutine to draw an edge 
f 1 1 60 # f = forward 
f 1 r 120 # 1 and r = turn 
f 1 1 60 
f 1 
r 120 
end 

: repeat 3 kochedge 
: scale 0.5 
: repeat 3 kochedge 
: scale 0.5 
: repeat 3 kochedge 
: scale 0.5 
: repeat 3 kochedge 



Figure 23-1: The resuit of running a simple Lepto program 
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Understanding the Lepto Language 

Lepto programs are very simple; each line of a Lepto script contains one or more 
complete statements. Blank lines and other whitespace are ignored, and comments 
will be like Python’s and consist of a pound Symbol (#) and everything after it on 
the same line. 

Table 23-1 explains the statements Lepto supports. 



Table 23-1 

Valid Lepto Statements 

Statement 

Description 

f amnt 

Move fonfl/ard (in the current direction) amnt units 

b amnt 

Move backward (away from the current direction) amnt units 

1 amnt 

Turn left amnt degrees 

r amnt 

Turn right amnt degrees 

scale amnt 

Multiplythe current scale by amnt. Initiallythe scale is 1, meaning 
one pixel for each unit of movement. 

color name 

Change the current drawing color to name. A color name is any 
valid Tki nter color. 

push arg 

Save a state attribute to its own stackfor later retrieval. arg can 

be one of coi or, di recti on, scal e, or positi on. 

pop arg 

Restore a previously saved state attribute, a rg is one of coi or, 
di recti on, scal e, or posi ti on. No effect results if the stack is 
empty. 

reset arg 

Restore a state attribute to its original value. a rg can be one of 

di recti on, coi or, screen, scal e, posi ti on, or al 1. 

include fi 1 e 

Read and execute the contents of the file named fi 1 e as if the 
contents had been entered from the console. 

sub name 

Begin the creation of a new subroutine called name. Ovenwrites 
any previous subroutine of the same name. 

end 

Finish creating a new subroutine 

cal1 name 

Execute the subroutine called name 

repeat count sub 

Repeatediy execute the subroutine called sub count times 


The features of this language are obviously limited so that the example isn’t too 
cumbersome, but it has enough functionality to be interesting. 
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Creating a Lepto Lexical Analyzer 

Users will create Lepto programs using a text editor or by entering them in via an 
interactive console. Either way, their input will be plain text, so the first step 
toward a finished application is parsing the text input and spitting out Lepto com- 
mands in some internal format that the rest of the program can understand. During 
this conversion, the parser will also verify that the Lepto commands are valid 
according to the simple grammar explained in the previous section. 

The shiex module 

Python’s shiex module provides basic lexical analysis for simple shell-like lan- 
guages. It defines the shiex class, which you can use as is or through your own 
subclass. You create a shl ex object by calling shlex([instreatn[, infile]]), 
where i nstream is an open filelike object and infile is the file’s name (printed 
with error messages). If you provide neither, then shiex uses stdi n. shiex breaks 
the input down into individual words, or tokens. 

A shl ex object has several members, which you can modify to affect how it inter- 
prets the input stream. The commenters member is a string of all valid comment 
characters (defaulting to and quotes is a string with all valid quote characters 
(defaulting to single and double quotes). If a comment character is in the middle of 
a token (with no surrounding whitespace), it counts as a single token that just so 
happens to contain the comment character. 

The whitespace member is a string of token separators (by default, whitespace is 
any combination of tabs, spaces, carriage returns, and linefeeds). wordchars 
defaults to alphanumeric characters (letters and numbers) and the underscore; it 
represents all valid token characters. Any character not in whi tespace, 
wordchars, quotes, or commenters is returned as a single-character token. 

source is a string holding the keyword shiex uses as the “import” or “include” 
equivalent found in Python or C, telling s h 1 ex to read and parse the contents of a 
file. Setting it to a value of beabl e, for example, means a user can use the following 
command to include the contents of the file f oof oo . txt: 

beable "foofoo.txt" 

infile is the name of the current file (the original input file name, or the name of 
the file currently being included), and i nst ream has the filelike object from which 
data is being read. The 1 i neno member is the current source line number. For 
debugging purposes, you can set the debug member to 1 or more to have shiex 
generate more verbose output. 

With your shiex object configured the way you want, all you need to do is repeat- 
edly call its get_token () method to retrieve the next token from the stream. When 
all the input has been read, getjoken returns an empty string. push_token (str) 
pushes str onto the token stack (so that the next call to get_token returns str). 
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When a user includes a file’s contents (using the keyword stored in source), the 
sourcehook(path) method is called to locate and open the file called p a t h . You 
can override this method to Implement your own file location algorithm; source- 
hook returns a 2-tuple (file name, open file object). 

If you need to print out an error message, prefix your message wlth the string 
returned from the objecfs error_l eader ( [fi 1 e[, line]]) method. Unless you 
indicate otherwise, it uses the current file name and line number to return a mes¬ 
sage header string that is friendly to editors such as Emacs. For example: 

>>> print s.error_leaderC)+'Expected a number' 

"foofoo.txt", line 17: Expected a number 

Putting shiex to work 

The parser in Listing 23-2 understands the simple Lepto language as described ear- 
lier in this chapter. At the highest level, it repeatedly calls shiex. get_token to get 
a command and then calls a corresponding parse_<command> method to read and 
verify that command’s arguments. Each finished command is stored in a LeptoCmd 
object (a simple Container), all of which are buffered and eventually returned as a 
list of commands. 


Listing 23-2: leptoparser.py - Coverts tokens 
to LeptoCmd objects 


import shiex,sys 

class LeptoCmd: 

'Simple Container class' 

def _init_(sel f,cmd,**kwargs): 

self.cmd = cmd 

self._dict_.updatef kwargs) 

def repr (self): 

s = 'LeptoCmd %s(' % self.cmd 

for item in self._dict_.itemsC): 

if item[0] != ' cmd': 

s += ' %s=%s' % item 
return s + ' )' 
class LeptoParser: 

def i nit (self,stopOnError = l): 

self.stopOnError = stopOnError 
def err(self,msg,dest=sys.stderr): 

dest.writefself.lexer.error_leader()+msg+'\n') 


Continued 
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Listing 23-2 (continued) 


def next_token(sel f): 

'Returns the next token or None on error' 

tok = self.1exer.get_token() 
if tok == ’': 

self.err('Unexpected end of file') 
return tok 

def next_nutnber (sel f,func=float): 

'Returns the next token as a number' 

tok = self.next_token() 
if tok: 

try: tok = func(tok) 
except ValueError: 

return self.err('Expected a number, not '+tok) 
return tok 

def parse_reset(self): 

tok = sel f. nextjoken () 
if tok: 

if not tok in ['al1','di rection','coior', \ 

'screen','scale','position' ,\ 

'stacks' ]: 

return self.err('Invalid reset argument') 
return LeptoCmd('reset',arg=tok) 

def parse_push(self): 

tok = sel f.next_token() 
if tok: 

if not tok in ['coior','di rection', \ 

'scale' , ' position ' ]: 

return self.err('Invalid push argument') 
return LeptoCmd('push',arg=tok) 

def parse_pop(self): 

tok = sel f.next_token() 
if tok: 

if not tok in ['coior','di rection', \ 

'scale','position ' ]: 

return self.err('Invalid push argument') 
return LeptoCmdC'pop',arg=tok) 

def amntcmd(self,cmd): 

'Util for commands with a single numerical arg' 

num = self.next_number() 

if num: return LeptoCmd(cmd,amnt=num) 

# These are all nearly identical 

def parse_f(self): return self.amntcmd('f') 
def parse_b(sel f): return self.amntcmd('b') 
def parse_l (sel f): return self.amntcmd('1') 
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def parse_r(sel f): return sel f. atnntctnd( ' r ' ) 

def parse_scale(sel f): return sel f. atnntcmdC ' scale ' ) 

def namecmd(self,cmd): 

'Util for commands with a single string arg' 

tok = self.next_token() 

if tok: return LeptoCmd (ctnd , natne=tok) 

# More nearly identical stuff 

def parse_col or(sel f): return sel f. namectndC ' coi or ' ) 
def parse_sub(self): return self.namecmdl'sub') 
def parse_cal1(self): return self.namecmd('cal1') 

def parse_end(self): return LeptoCmdC'end') 

def parse_repeat(sel f): 

num = self.next_number() 
if num: 

n = self.next_token() 
i f n: 

return LeptoCmd('repeat',count=num,name=n) 

def parse(self, stream=None, name=None): 

'Returns a list of LeptoCmd objects' 
lexer = shlex. shlex(stream,name) 
lexer.source = 'include' 

1exer.wordchars += # For numbers 

self.lexer = lexer 
cmds = [] 
w h i 1 e 1: 

tok = 1exer.get_token() 

if tok == '': # End of the file 
break 

# See if there's a parser for this token 

parser = 'parse_'+tok 
if not hasattr(self,parser): 

self.err('Unknown command: '+tok) 
if self.stopOnError: 

break 
el se: 

conti nue 

# Call the parser to convert to a LeptoCmd object 

cmd = getattr(self,parser)() 
if cmd is None: 

if self.StopOnError: break 
else: continue 

cmds.append(cmd) 


return cmds 



440 Part IV > User Interfaces and Multimedia 


Basically, you create a LeptoParser object, pass it a stream, and it returns to you a 
list of LeptoCmd objects, checking for errors along the way. Later sections will make 
use of the LeptoParser class, but you can already verify that it works correctly: 


>>> import leptoparser 

>>> p = 1eptoparser.LeptoParserC) 

>>> p.parseC) 


color red 



# 

f 10 1 20 

f 

10 1 5 f 5 

# 




# 

[LeptoCmd 

color( name= 

red 

LeptoCmd 

f( 

amnt=10.0 

), 

LeptoCmd 

f( 

amnt=10.0 

), 

LeptoCmd 

f( 

amnt=5.0 

)] 


You enter this 
You enter this 
Hit CtrlZ (Win) or Ctrl 
), 

LeptoCmd 1( amnt=20.0 ) 
LeptoCmd 1( amnt=5.0 ), 


D (Unix) 


Adding Interactive-Mode Features 

The next step toward a finished application is the addition of a “shell” similar to 
when you use Python in interactive mode. The shell passes the commands through 
to the parser, and also provides online help. 


Using the cmd module 

The cmd module defines the Cmd class that provides some scaffolding for building 
an interactive, command-line interpreter. Because it is just scaffolding, you nor- 
mally don’t use it directly, but instead create a subclass. If the readl i ne module is 
present, cmd automatically uses its editing and history features. 

The readl i ne module is an optional UNIX module, covered in Chapter 38. 


The Cmd constructor takes no arguments, but once you have a Cmd object (or an 
object of your subclass), you can use the following members to customlze it. 

The prompt member is the input prompt displayed while the user enters a com- 
mand. i dentchars is a string containing all acceptable characters in a command 
prefix (defaulting to letters, numbers, and underscores). By default the prompt is 

'(Cmd) '. 

For each line of input from a user, Cmd considers the first token to be a command 
prefix, and it uses that prefix to dispatch the input to a handler method. For exam- 
ple, if the first word on the line of input is the string reverse, then Cmd sends the 
remainder of the line to its do_reverse (line) method, if present. If no handler is 
present, the line is sent to the default(line) method. 
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Ctnd comes with a few special commands. If a user enters Help reverse or just ? 
reverse, a built-in d o_h e 1 p method calls h e 1 p_r e v e r s e (), if present, which you 
can implement to print online Help Qust print it to stdout using one or more pri nt 
statements). A command prefix of just an exclamation point sends the remaining 
arguments to a do_shel 1 (1 i ne) method if it exists. If the input is just a blank line, 
the empty 1 i ne () method is called, which by default repeats the previous input (by 
calling sel f. onecmdf sel f. 1 astcmd)). Finally, when the end of user input is 
reached, the do_E0F() method is called. 

onectnd (line) takes an entire line of input and processes it as if it had been 
entered by the user. 

The ctndl oop ([ i ntro]) method makes Ctnd repeatedly prompt the user for input 
and then dispatches it. i ntro is a message to display before entering the loop; if 
omitted, Ctnd displays the message in sel f. i ntro, which is empty by default. You 
can implement the p r e 1 o o p () and p o s 11 o o p () methods to do work immediately 
before and after Ctnd goes into its loop (i.e., they will both be called once per call to 
the ctndl oop method). 

For each line of input, Ctnd performs a series of calls like the following: 

stop = None 

line = raw_input(self.prompt) 
line = self.precmd(1 i ne) 
stop = self.onecmd(1 i ne) 
stop = self.postcmd(stop, line) 

It receives user input, sends it to precmd (where you can modify it if you want), and 
then passes it off to onecmd , where the correct do_<command> method is called. If, 
at the end of the loop, stop has a value besides None, cmdl oop calls posti oop and 
then returns. 

If a user enters help with no other argument, do_hel p displays a sort of table of 
contents of available help topics: 

print self.doc_header 

print self.ruler * 1 en(self.doc_header) 

print <all do_ methods that have a help_ method> 

print sel f.misc_header 

print self.ruler * 1 en(self.misc_header) 

print <all help_ methods that don't have a do_ method> 

print self.undoc_header 

print self.ruler * self.undoc_header 

print <all do_ methods without a help_ method> 
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Sample output might look something like the following: 

Documented commands (type help <topic>): 

go stop add subtract 

deiete 

Miscel1aneous help topics: 

OverView rules 

Undocumented commands: 
qui t 

Subclassing cmd.Cmd 

Listing 23-3 contains the next piece of the Lepto application, and it’s a good way to 
see a Cmd object in action; it detines LeptoCon, a Cmd subclass that wraps the Lepto 
parser so that users have online help and readl i ne support, if present. 


Listing 23-3: leptocon.py — Lepto Interactive console 


import cmd, leptoparser, cStringlO 

def defaultHandler(cmds): 

'Simple handler for testing' 

for cmd in cmds: 
print cmd 

class LeptoCon(cmd.Cmd): 
normalPrompt = ': ' 
subPrompt = '.. 

def _i nit_(self,handler=defaultHandler): 

cmd.Cmd._i nit_(self) 

self.timeToQuit = 0 

self.prompt = self.normalPrompt 

self.parser = 1eptoparser.LeptoParser() 

self.doc_header = "Type 'help <topic>' for info on:" 

self.intro = 'Welcome to Lepto!\n'\ 

"Enter a command or type 'help'" 
self.misc_header = '' 
self.undoc_header = '' 
self.handler = handler 

def do_sub(self, 1 i ne): 

'Change the prompt for subroutines' 

self.prompt = self.subPrompt 

sel f. defaul t( ' sub '+line) # Now process normally 
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def do_end(sel f, 1 i ne): 

'Change the prompt back after subroutines' 

sel f. protnpt = sel f. normal Prompt 

self.default('end '+line) # Now process normally 

def default(self,1 i ne): 

'Called on normal commands' 

sio = cStringlO.StringI0(1 i ne) 

cmds = self. parser.parseCsi 0 ,’Console') 

self.handler(cmds) 

def do_quit(sel f, 1 i ne): 
self.timeToQuit = 1 

def postcmd(self,stop, 1 i ne): 
if self.timeToQuit: 

return 1 
return stop 

# Now come all the online documentation functions 

def heip_help(self): print 'I need help!' 
def heip_quit(self): print 'Duh.' 

def heip_reset(self): 

print 'reset <all | di rection | color | '\ 

'screen | scale | position | stacl<s>’ 
print 'Reverts to default settings' 

def heip_color(sel f): 

print 'color <name | None>' 

print 'Changes current color to <name> or no color'\ 
' for invisible movement' 

def heip_push(sel f): 

print 'push <color | direction | scale | position>' 
print 'Saves an attribute to its own stack' 

def heip_pop(self): 

print 'pop <color | direction | scale | position>’ 
print 'Retrieves a previously pushed attribute' 

def help_f(self): 
print 'f <amnt>' 

print 'Moves forward in the current direction' 

def heip_b(self): 
print 'b <amnt>' 

print 'Moves opposite of the current direction' 

def help_l(self): 
print '1 <amnt>' 

print 'Turns left the specified number of degrees' 


Continued 
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Listing 23-3 (continued) 


def heip_r(self): 
print 'r <atnnt>' 

print 'Turns right the specified number of degrees' 

def heip_scale(self): 
print 'scale <anint>' 

print 'Multiplies the current scaling factor by amnt' 

def heip_sub(self): 
print 'sub <natne>' 

print 'Creates a new subroutine called name' 

print 'Be sure to terminate it using the end command' 

def heip_end(self): 

print 'endXnEnds a subroutine definition' 

def heip_cal1(self): 

print 'call <name>\nCal1s a subroutine' 

def heip_include(self): 

print 'include "fi 1 e" \nExecutes the contents of a file' 

def heip_repeat(self): 

print 'repeat <count> <name>' 

print 'Calis a subroutine several times' 

if _name_ == '_main_': 

c = LeptoCon() 
c.cmdloop() 


Because the parser handles entire commands, most commands are routed to the 
def aul t method, which passes the whole line on to the parser. Once again, this is 
part of a stili larger program, but you can test this portion of it to make sure every- 
thing’s working. Here’s an example session from a Windows command line (text in 
bold is what I typed): 

C:\temp>python leptocon.py 

Welcome to Lepto! 

Enter a command or type 'help' 

: hei p 

Type 'help <topic>' for info on: 


hei p 
sub 

quit 
include 

b 

r 

push 

pop 

1 

scale 

coi or 

f 

end 

repeat 

call 

reset 
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: help call 

call <natne> 

Calis a subroutine 

: color red 

LeptoCmd color( narrie=red ) 

: f 10 r 20 f 10 
LeptoCmd f( amnt=10.0 ) 

LeptoCmd r( amnt=20.0 ) 

LeptoCmd f( amnt=10.0 ) 

: quit 

C:\temp> 

Notice that you can enter more than one command per line as long as the entire 
command is on that line. The default command handler does nothing but print the 
commands to stdout, but it at least lets you see whafs happening. 

It may seem like overkill to use both shl ex and cmd because there is some overlap 
in what they do (1 could have just implemented methods such as do_col or, 
do_reset, and so on, for example). But as youVe seen, using both made it easy to 
test these first two parts independently, which could be important for languages 
with more complex grammars. It also makes it easy to later re-use LeptoParser for 
handling input directly from a file. Furthermore, it enables you to easily add interac- 
tive-mode features (such as online help and using a different prompt when the user 
is defining a subroutine) without cluttering the parsing code. 


Executing Lepto Commands 

Now that you have a Lepto parser and a user-friendly interface, all you need is some- 
thing to act on those commands. The code in Listing 23-4 builds upon the previous 
two sections and creates a graphical display showing the results of the Lepto Scripts 
(the display is nothing more than a Tkinter window with a single canvas widget). 


Listing 23-4: leptgui.py - Plots Lepto commands 


from Tkinter import * 

import leptocon, threading, math 

deg2rad = math.pi * 2.0 / 360.0 

class LeptoGUI: 

def _init_(sel f,canvas): 

self.canvas = canvas 
self.subs = {j 
self.newSub = None 
self.firstCmd = 1 


Continued 
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Listing 23-4 (continued) 


def do_reset_direction(self): self.di rection = 0 
def do_reset_color(self): self.color = 'black' 
def do_reset_scale(sel f): self.scale = 1.0 

def do_reset_position(sel f): 

# Move to center of canvas 
X = self.canvas.winfo_width() / 2 
y = self.canvas.winfo_height() / 2 
self.position = (x,y) 

def do_reset_screen(sel f): 

ids = self.canvas.find_al1() 
self.canvas.dei ete(*ids) 

def do_reset_stacks(sel f): 
self.di rection_stk = [] 
self.coior_stk = [] 
self.scale_stk = [] 
self.position_stk = [] 

def do_reset_al 1 (sel f): 

self.do_reset_directioni) 
self.do_reset_col ori) 
self.do_reset_scalei) 
self.do_reset_positioni) 
self.do_reset_screeni) 
self.do_reset_stacksi) 

def do_resetisel f,cmd): 

'Reset color, position, etc' 

getattriself, ' do_reset_'+crrid. arg) i) 

def do_colorisel f,cmd): 

'Change color' 
self.coior = None 
if cmd.name.1oweri) != 'none': 
self.color = cmd.name 

def do_pushisel f,cmd): 

'Push a color, position, etc' 

arg = cmd.arg 

getattriself,arg+'_stk').appendigetattriself,arg)) 

def do_popisel f,cmd): 

'Pop a color, position, etc' 
stk = getattriself,cmd.arg+'_stk') 
if lenistk): 

setattriself,cmd.arg.stk.popi)) 
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def do_f (sel f, ctnd): 

'Move forward' 

x,y = sel f.position 

dir = self.di rection * deg2rad 

amnt = self.scale * cmd.amnt 

nx = X + amnt * math.cos(dir) 

ny = y - amnt * math.sin(dir) 

if self.coior: 

self.canvas.create_li ne(x, y, nx, ny, width=l,\ 

fi 11=self.coior) 

self.position = (nx,ny) 

def do_b(self,cmd): 

'Move backward' 

self.di rection = (self.di rection + 180) % 360 
self.do_f(cmd) 

self.di rection = (self.di rection - 180) % 360 

def do_l(self,cmd): 

'Turn left' 

self.di rection = (self.di rection + cmd.amnt) % 360 


def do_r(self,cmd): 

'Turn right' 

self.di rection = (self.di rection - cmd.amnt) % 360 


def do_scale(self,cmd): 
'Change scale' 
self.scale *= cmd.amnt 


def do_sub(self,cmd): 

'Create a new subroutine' 

if self.newSub: 

print "Can't create nested subroutines" 
return 

self.newSub = cmd.name 
self.subs[cmd.name] = [] 

def do_end(self,cmd): 

'Finish creating a subroutine' 

if not self.newSub: 

print 'No subroutine to end' 
return 

self.newSub = None 


def do_cal1(self,cmd): 

'Invoke a subroutine' 

sub = cmd.name 

if sel f.subs.has_key(sub): 

sel f.cmdHandler(self.subs[sub]) 
el se: 

print 'Unknown subroutine',sub 
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Continued 
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Listing 23-4 (continued) 


def do_repeat(sel f,cmd): 

' repeat - Just do_call <count> times' 

c = 1eptocon.1eptoparser.LeptoCmd('cal1',name=cmd.name) 
for i in range(cmd.count): 
self.do_cal1(c) 

def cmdHandler(self,cmds): 

'Called for each command object' 

if self.firstCmd: 

# Widget info (w,h) won't be ready in the 

# constructor, but it wi11 be ready by now 

self.firstCmd = 0 
self.do_reset_al1() 

for cmd in cmds: 

if self.newSub and cmd.cmd != 'end': 

self.subs[self.newSub].appendicmd) 
el se: 

getattriself,'do_'+cmd.cmd)(cmd) 

if _name_ == '_main_ 

# Create a Tk window with a canvas 

root = Tk() 

root.titlei'LeptoGUI') 

canvas = Canvas(root,bg='White') 

canvas.pack() 

gui = LeptoGUI(canvas) 

# Let Tkinter run in the background 

threading.Threaditarget=root.mainloop).start() 

# Repeatedly get commands and process them 

c = 1 eptocon.LeptoCon(gui.cmdHandler) 
c.cmdloop() 
root.quit() 


1 eptogui . py uses the usual trick of dispatching commands by taking a command 
name (such as sca 1 e), converting it to a method name (do_sca 1 e), and then invok- 
ing it. Because so much work was taken care of by the parser, the final pieces of the 
graphical application ended up being quite simple and straightforward. 

Launch 1 eptogui . py to give Lepto a try. Following is a sample session; the result- 
ing output is shown in Figure 23-2. 






Chapter 25 4- Building Simple Command Interpreters 449 


C:\tenip>leptogui .py 
Welcotne to Lepto! 

Enter a command or type 'help' 

: coior blue f 40 r 90 
: coior green f 40 r 90 
: coior red f 40 r 90 
: coior brown f 40 r 90 

: 1 90 color none f 20 # Please step away from the box 
: color black 

: sub rayrot # Draws a ray and then rotates left 

push position 
f 100 

pop position 

15 

end 

: repeat 10 rayrot 




Figure 23-2: Sample output from a program written 
in the custom language called Lepto 

You can store useful subroutines in a separate file and import them using the 
include command. For example, save the Lepto code that follows in Listing 23-5 to 
a file called shapes . 1 ep, and try the following (the output is shown in Figure 23-3): 

C:\temp>leptogui.py 
Welcome to Lepto! 

Enter a command or type 'help' 

: include "shapes.lep" 

: color blue 
: call circle 
: color black 
: scale 10 
: call t ri 
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r 180 

color green 
call box 

sub cirrot # draw a circle and rotate a litti e 

push color push position 
color none 
f 50 

pop color 
call circle 
pop position 
r 20 
end 

reset scale reset position reset color 
repeat 18 cirrot 


Listing 23-5: shapes.Iep - Sample Lepto include file 


sub circedge f 10 r 15 end 

sub circle repeat 24 circedge end 

sub box f 10 r 90 f 10 r 90 f 10 r 90 f 10 r 90 end 

sub tri f 10 r 120 f 10 r 120 f 10 r 120 end 



Figure 23-3: Lepto program using subroutines stored in a separate file 


Lepto is a simple, yet realistic, example of how you can benefit from the s hl ex and 
cmd modules. A good exercise to try now would be to expand the grammar of Lepto 
to make it more powerful. For example, you could add support for variables and 
expressions (let Python do the work of evaluation via the e va 1 functlon), or you 
could let the repeat statement accept a sequence of commands instead of forcing 
users to define a subroutine first. 











Chapter 25 4- Building Simple Command Interpreters 451 


Summary 

A scripting interface to a program gives your users powerful tools to work with. 
Python’s shl ex module makes lexical analysis a lot less tedious, and cmd gives 
you a base upon which you can build a flexible command-line interface. In tbis 
cbapter, you: 

Created a parser for a simple scripting language. 

Wrapped the parser in a command-line interface complete with bullt-in onllne 
help. 

4 Built an interpreter for the parser output that plots drawings graphically. 

The next chapter covers Python’s support for processing and playing sound files in 
various formats. 



Playing Sound 



S ound is stored in a bewildering range of formats. 

Fortunately, Python’s Standard libraries can read, write, 
and convert a wide range of audio files. You can also play 
back sounds on a variety of platforms. 


Sound File Basies 

Sound is basically vibration in the air. The louder the sound, 
the more forceful the vibration. The higher the sound, the 
faster the vibration. 

To store sound digitally, a microphone or other recorder mea- 
sures (or samples) the analog sound waveform many times 
per second. Each sample takes the form of a number, and this 
number measures the amplitude of the sound waves at an 
instant in time. A speaker can later translate digitized sound 
(this long list of integers) back into sound waves. 

There are many, many ways to digitize and store sound. They 
can differ in several ways: 

-f Sample rate — How many times per second the ampli¬ 
tude of the sound waves is measured. A common sample 
rate is 44100 Hz (samples per second), the rate used on 
audio compact dises. 

-f Sample encodlng — The simplest (and most common) is 
linear encoding, where each sample is a linear measure- 
ment of amplitude. Other encoding types include u-LAW, 
in which measurement is performed on a logarithmic 
scale. 

-f Sample width—A sample can be an 8-bit, 16-bit, or 32- 
blt integer. 

Channels — Sound can be recorded with one, two, or 
more channels. This boils down to the storage of one or 
more audio streams together in one file. The corre- 
sponding samples from each channel are normally 
stored together in one frame. 


> ♦ ♦ ♦ 

In This Chapter 

Sound file basies 

Playing sounds 

Exomining audio files 

Reoding and writing 
audio files 

Handiing raw audio 
data 

♦ ♦ ♦ > 
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All sound formats make some trade-offs between sound quality (how much informa- 
tion is lost in digitizing the sound) and file size (the better the sound quality, tbe 
more data needs to be stored). For example, one second of sound could be stored 
in 8-bit mono at a sample rate of 22050; the total space used would be 22050 bytes. 
Storing the same sound in 16-bit stereo at a sampling rate of 44100 would require 
44100 frames at 4 bytes per frame, for a total of 176400 bytes (8 times as much 
space). 


Playing Sounds 

Because playing sound is tied to the operating system (OS), the libraries for playing 
sound are also OS-specific. 

Playing sound on Windows 

The module wi nsound plays sound on a Windows system. The function 
Beep( frequency, durati on ) uses the computer’s internal speaker to play a 
sound at pitch frequency for duration milliseconds. The frequency can range from 
37 to 32767. If Beep can’t play the sound, it raises a Runti meError. For example, 
the following code plays a tinny-sounding major scale, starting from middle C. 

Each note lasts half a second: 

Seal ePitches=[262,294,330,349,392,440,494,523] 
for Pitch in SealePitches : 
winsound.Beep(Pitch,500) 

The function PlaySound(sound,flags) plays a WAV file, using any avallable sound 
card. The parameter sound can be a file name, an alias, an audio stream, or None. 
The parameter flags should equal one or more constants, combined using 
bitwise-OR. 

Specify one (and only one) flag to indicate where the sound should come from: 

SND_F1LENAME indicates that the sound is the path to a WAV file. 

-f SND_AL1AS indicates that sound is the name of a control panel sound- 
association. 

SND_MEMORY indicates that sound is the contents of a WAV file. 

For example: 

# Play a sound file from disk: 

Sound Fi leNatne = "Judy Gari andKraftCheese.wav" 
winsound.PlaySound(SoundFileNatne,winsound.SND_FILENAME) 
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# Play the "Exciamation" sound, as set up in Control Panel: 

winsound.P1aySound("Exci amation",winsound.SND_ALIAS) 

# Read sound file from disk, then play it: 

SoundFi 1e=open(SoundFi 1eName," rb") 

winsound.PlaySound(SoundFile.read(),winsound.SND_MEMORY) 


Other flags let you tweak behavior: 


SND_ASYNC 

SND_LOOP 

SND_PURGE 

SND_NOSTOP 


Start playing the sound and return immediately. 
Otherwise, the call to P1 aySound doesn’t return until the 
sound has finished playing. 

Keep playing the sound indefinitely. (This flag should he 
comhined with SND_ASYNC.) 

Stop the specified sound. 

Don’t stop currently playing sounds. (Raise RuntimeError 
if a sound is playing.) 


SND_N0WA1T Return immediately if the sound driver is husy. 
SND_NODEEAULT If the sound is not found, don’t play a default heep. 


Playing and recording sound on SunOS 

The Sun audio hardware can play audio data in u-LAW format, with a sample rate of 
8000 Hz. The module sunaudi odev enahles you to manipulate the Sun audio hard¬ 
ware using a filelike ohject. The related module SUNAUDIODEV provides various con- 
stants for use with sunaudi odev. 

The function open (mode ) returns an audio device ohject. The parameter mode can 
he r for recording, w for playhack, rw for hoth, or control for control access. 


Playing sound 

The method wri te( sampl es ) plays sound, where samples is audio data as a string. 
A call to wri te adds the audio data into the audio device’s huffer. If the huffer 
doesnT have enough room to contain samples, wri te will not return immediately. 

The method obufcount returns the numher of samples currently buffered for 
playhack. 

The method f 1 us h stops any currently playing sound, and clears the audio output 
huffer. The method drai n waits until playhack is complete, and then returns. 


Recording sound 

The method read ( s i ze ) reads exactly size samples from the audio input, and 
returns them as a string. It blocks until enough data is available. 
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The method ibufcount returns the number of samples buffered for recording; you 
can re ad up to this many samples without blocking. 


Controlling the audio device 

The audio device provides a status object. The object has no methods, but has 
attributes as described in the audio man page. The device object provides acces- 
sors geti nfo and seti nf o for tbe status object. 

The method fi 1 eno returns the file descriptor for the audio device. 


Examining Audio Files 

Because there are so many file formats for storing sound, it is sometimes difficult to 
know which format a particular file uses. The module sndhdr provides a function, 
what(filenatne), that examines the file filename and returns its storageformat. 
(The function whathdr is a synonym for what .) 

The return value of what is a 5-tuple of the form (type, SampleRate, channels, 
frames, BitsPerSample). Here, type is the data type; its possible values are aifc, aiff, 
au, hcom, sndr, sndt, voc, wav, 8svx, sb, ub, and ul. The value BitsPerSample is A for 
A-LAW encoding, U for u-LAW encoding, or the number of bits for Standard encoding. 

The values SampleRate and channels are 0 if they cannot be determined. The value 
frames is -1 if it cannot be determined. If what is completely stumped (for example, 
if the file isn’t a sound file at all), it returns None. 

For example, the following code examines a .wav file. The file has a sampling rate of 
11024. It is in mono, and uses 8 bits per sample: 

>>> print sndhdr.what("bond.wav") 

('wav', 11025, 1, -1, 8) 

This file is in SunAudio format, in mono, with 188874 frames in all: 

>>> paratns=sndhdr .what( "fal 1 ofthephoton . au") 

>>> paratns 

('au' , 8012, 1, 188874, 'U' ) 

>>> fl oat(pararris[3] )/pararris[l] # sound length (in seconds) 
23.573889166250623 


Reading and Writing Audio Files 

The modules ai fc, wave, and sunau handle AIFF, WAV, and AU files, respectively. 
The interfaces for the modules are almost identical. The ai fc module is docu- 
mented first, followed by an accounting of tbe differences. 
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Reading and writing AIFF files with aifc 

The method open(file[, mode]) returns an audiofile object. The parameter file is 
either a file name or an open filelike object. If file is a file name, use the mode 
parameter to control how the file is opened. 

File format 

An audiofile object provides accessors for the file format. You can access each com¬ 
ponent of the file format on any audiofile. You can also set the file format on a new 
audiofile, but only before writing any frames: 

getnchannels, setnchannels(channels) —Access the numberofchannels. 

getsampwi dth , setsampwi dth (si ze ) —Access the size, in bytes, of each 
sample. 

♦ getf ramerate, setf ramerate (frames) —Access the number of frames per 
second. 

♦ getnf rames, setnf rames (frames ) —Access the number of frames in the 
entire file. 

getcomptype , getcompname, setcomptypeCtype ,name) —Access the com- 
pression scheme. getcomptype returns the compression scheme as a code: 
NONE, ALAW, ULAW, or G722. getcompname returns the compression scheme 
as a human-readable string. Of the parameters tosetcomptype, type should 
be a code (as returned by getcomptype), and name should be a human- 
readable name (as returned by getcompname). 

The method setparams sets all five components at once. Its argument is a tuple of 
the form (Channels,SampleWidth,FrameRate,CompType,CompName). The method 
getparams returns the parameters in the same order. 

Note Usually, you need not call setnframes to write out a new file, because the 

number of frames is written to the file's header when you call cl ose. However, if 
you open a filelike object that does not support seeking, then you must call 
setnframes before writing out audio data. 


Input 

The method readframes(count) reads count frames of audio data from the file, 
returning them (decompressed) in a string. 

Output 

The method writeframes(data) writes the audio data data to the file. The method 
wri teframesrawCdata ) writes audio data without updatingthe header; it is useful 
for writing to a filelike object with no seek method. 
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Frame numbers 

When reading, the method setpos (f ratnenutnber ) jumps to frame framenumber, 
and the method rewi nd jumps to the beginning of the file (frame 0). 

When writing, the method teli returns the current frame number. 


Using markers 

An AlFF file can have one or more markers. A marker has an id number, a position 
(frame number), and a name. To create a marker when writing a file, call 
settnark(id,position,natne). When reading a file, you can access a list of 
markers by calling getmarkers. Each list element is a tuple of the form 
(id,position,name). You can also access aparticular marker with getmarkf id). 

Reading and writing AU files with sunau 

The interface of the sunau module is basically the same as that of a i f c , with the 
following two exceptions: 

-f The available compression types are limited to ALAW, ULAW, and NONE. 

-f Markers are not available. Stub marker methods are provided for compatibil- 
ity with ai fc. 

Reading and writing WAV files with wave 

The interface of the wave module is basically the same as that of ai fc, with these 
two exceptions: 

Compression is not available; the only supported scheme is NONE. 

-f Markers are not available. Stub marker methods are provided for compatibil- 
ity with ai fc. 

Example: Reversing an audio file 

Listing 24-1 reads in an audio file, and then writes out the same sound played back- 
wards. Note that this could also be accomplished by one call to audi oop . reverse 
(see “Handling Raw Audio Data,” later in this chapter). This example does things 
the long way for purposes of exposition. 
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Listing 24-1 : ReverseSound.py 


.Reverse a sound file. Handy for finding subliminal 

messages.. 

import sndhdr 
import aifc 
import sunau 
import wave 

def ReverseAudioStream(AudioFi 1 eln,AudioFi 1 eOut): 

Reverse an audio file (takes two opened audiofiles 
as arguments) 

# Get header info from the input file; write it out to 

# the output file. 

Params=AudioFi1eln.getparams() 

AudioFi1eOut.setparams(Params ) 

# Collect all the frames into a list, then write them out 

# in reversed order: 

FrameCount=AudioFileIn.getnframes() 

FrameDataList=[] 

for Frameindex in range(FrameCount): 

FrameData Li st.appendi AudioFi1eln.readframes(1)) 
for Frameindex in range(FrameCount-1,-11): 

AudioFi 1 eOut.writeframes ( FrameDataList[Framelndex] ) 

# We're done! Close the files. 

AudioFi1eln.close() 

AudioFi1eOut.close() 


def ReverseAudioFi1e(InputFi1eName,OutputFi1eName): 

Reverse an audio file (takes two file names as arguments) 

# First, check to see what kind of file it is: 

FileInfo=sndhdr.what(InputFileName) 
if (Fi 1eInfo==None): 

print "Unkown sound format - can't reverse:", 

InputFi 1eName 
return 

Fi 1eType=Fi1eInfoFO] 
try: 

if Fi 1eType=="aifc" or Fi 1eType=="aiff": 

# aiff/aifc: use aifc module 

InFi1e=aifc.open(InputFi1eName,"rb") 

OutFi 1e=aifc.open(OutputFileName,"wb") 
elif Fi 1eType=="au": 

# Sun Audio format: use sunau module 
InFile=sunau.open(InputFileName,"rb") 

OutFi 1e=sunau.open(OutputFi 1eName,"wb") 


Continued 
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Listing 24-1 (continued) 


elif Fi 1eType=="wav": 

# Wave format: use wave module 

InFi1e=wave.open(InputFi1eName,"rb") 

OutFi1e=wave.open(OutputFileName,"wb") 
el se: 

print "Sorry, can’t reverse type",Fi 1eType 
return 

ReverseAudioStream(InFile,OutFile) 
except lOError: 

print "Unable to open file!" 
return 

if (_name_=="_main_"): 

# Reverse a file. Then reverse it again, to get 

# (hopefully) the same thing we started with: 

ReverseAudi oFi 1 e("test.wav","backwards.wav") 

ReverseAudioFi 1e("backwards.wav","forwards.wav") 

# Try another audio format, too: 

ReverseAudioFi1e("test.au","backwards.au") 

ReverseAudioFi1e("backwards.au"forwards.au") 


Reading IFF chunked data 

Some sound files are divided into chunks, including AIFF files and Real Media File 
Format (RMFF) files. The chunk module provides a class, C h un k, to make it easier to 
read these files. 

Each chunk consists of an ID (4 bytes), a length (4 bytes), data (many bytes), and 
possibly one byte of padding to make the next chunk start on a 2-byte boundary. 
The length generally does not include the 8 header bytes. The length is normally 
stored in big-endian format (most-significant bit first). 

The constructor has the following syntax: 

Chunk(file[,align[,bigendian[,inclheader]]]). Here, file is an opened file- 
like object that contains chunked data. The flag align indicates whether chunks are 
aligned. The flag bigendian indicates whether the chunk length is a big-endian num- 
ber. And the flag inclheader indicates whether the length includes the 8 header 
bytes. Parameters align and bigendian default to true; mc/header defaults to false. 

The methods get name and getsi ze return the ID and the size of the chunk, respec- 
tively. The method cl ose skips to the end of the current chunk, but does not close 
the underlying file. After calling cl ose on a chunk, you can no longer read or 
seek it. 





Chapter 24 4- Playing Sound 461 


The method read([size]) reads up to size bytes of data from the chunk, and 
returns them as a string. If size is omitted or is negative, it reads the entire chunk. If 
no data is left in the chunk, it returns a blank string. 

The method teli returns the current offset into the chunk. The method s ki p 
jumps to the end of the current chunk. And the method seek(pos[,whence]) 
jumps to the position pos. If whence is 0 (the default), pos is measured from the 
start of the file. If whence is 1, pos is measured relative to the current file position. 
And if whence is 2, pos is measured relative to the start of the chunk. In addition, 
the method i satty is defined and returns 0 (for compatibility with normal file 
objects). 

Normally, one iterates over chunks of a file by creating, reading, and closing several 
chunk instances, as follows: 

def PrintChunklnfo(ChunkedFile): 
try: 

w hi 1 e (1): 

CurrentChunk=Chunk(ChunkedFile) 
print "ID:" ,CurrentChunk.getnatne() 
print "Size:",CurrentChunk.getsize() 

Chunk.close() 
except EOFError: 

# Constructing a chunk failed, because we 

# finished reading the file. Exit loop: 

break 


Handiing Raw Audio Data 

The module audi oop is a big box of handy functions for working with audio data. It 
is implemented in C, for speed. Each function takes audio data as a fragment, a 
sequence of linear-encoded samples, stored as a string. Most functions can handle 
1-byte, 2-byte, or 4-byte sample widths, and they take the sample width as an argu- 
ment; a few can only handle 2-byte samples. 

Examining a fragment 

These following functions each take two arguments — a fragment and a sample 
width: 

avg returns the average of all the samples in the fragment. avgpp returns the aver- 
age peak-peak (with no filtering done). tnax returns the largest sample value. tnaxpp 
returns the largest peak-peak value. mi nmax returns a tuple of the minimum and 
maximum samples. cross returns the number of zero-crossings in the fragment. To 
measure the power of the fragment audio signal, call rms (root-mean-square). 
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The function getsatnpl e (f ragtnent, wi dth , n ) returns the nth sample from a frag- 
ment. The sample Is frame number index if fragment is in mono. 

Searching and matching 

The function f i ndfactor(target, fragment) attempts to match /fagmenf with tar- 
get. It returns a float X such that fragment multiplied by X is as similar to target as 
possible. The samples target and fragment should be 2-byte samples of the same 
length: 

>>> QuietData=audioop.mul(Data ,2,0. 5 ) # half as 1oud 
>>> audioop.findfactor(Data,QuietData) 

2.0001516619075197 

The function f i ndf i tCtarget,fragment) searches for fragment \n target. It 
returns a tuple of the form (offset,X). The closest match found starts at frame offset, 
and is scaled by a factor of X. Here, target and fragment are 2-byte samples, where 
fragment is no longer than target. 

The function fi ndmax( fragment, length) looks for the loudest part of a sound. It 
finds a slice length samples long for which the audio signal (as measured by rms) is 
as large as possible. It returns the offset of the start of the slice. 

Translating between storage formats 

The audi oop module can handle llnear encoding, u-LAW, and Intel/DVI ADPCM. It 
provides several functions for converting between these schemes, as shown in 
Table 24-1. 


Table 24-1 

Audio Format Conversion Functions 

Function 

Effect 

1 i n21in(fragment, 
wi dth, NewWidth) 

Converts a linear-encoded sample to a new sample width; 
returns the converted sample. Decreasing sample width 
lowers sound quality but saves space; increasing sample 
width just uses up more space. 

1 i n2adpcm(fragment, 
width,state) 

Converts a linear-encoded sample to 4-bit ADPCM encoding. 
The value state represents the encoder's internal state. The 
return value is (newfragment,newstate), where newstate 
should be passed for state to the next call to lin2adpcm. 

Pass None for state in the first call. Iin2adpcm3 is a variant 
of Iin2adpcm, using oniy 3 (not 4) bits per sample- 
difference. 
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Function 

Effect 

adpcm21in(fragment, 
width,state) 

Converts an ADPCM-encoded fragment to linear encoding. 
Returns a tuple of the form (NewFragment.NewState). 
adpcm32lin is a variant of adpcm2lin, for conversion from 

3-bit ADPCM. 

1 i n2ulaw 
(fragment,width) 

Converts a linear-encoded sound fragment to u-LAW 
encoding 

ul aw21 i n 

(fragment,width) 

Converts a u-LAW encoded fragment to linear encoding. 

(u-LAW encoding always uses 1-byte samples, so width 
affects oniy the output fragment.) 


In addition, you can convert linear-encoded fragments between mono and stereo. 
torriono(fragtnent,width,lfactor,rfactor) converts a stereo fragment to a 
mono fragment by multiplying the left channel by Ifactor, the right channel by rfactor, 
and adding the two channels. tostereo (fragtnent,w i dth, Ifactor, rfactor) 
converts a mono fragment to stereo. The left channel of the new fragment is the 
original fragment multiplied by Ifactor, and similarly on the right. 

Most audi 00 p functions do not differentiate between the left and right channels of 
stereo audio. Consider using tostereo and tomono: 

>>> audioop .tnax(Data, 2) # max over both channels 
26155 

>>> LeftChannel=audioop.tomonofData,2,1,0) # 1 eft*l,right*0 
>>> RightChannel=audioop.tomono(Data , 2,0,1) 

>>> audioop.max(RightChannel,2) 

26155 

>>> audioop.max(LeftChannel ,2) 

25556 

>>> LoudLeftChannel =audioop.mul(LeftChannel,2,2) 

>>> QuietRightChannel=audioop.mul(RightChannel,2,0.5) 

>>> # Add the two channels back together: 

>>> NewData=audioop.add(audioop.toste reo(LeftChannel,2,1,0), 

audioop.tostereo!RightChannel , 2,0,1K 
2) 

Manipulating fragments 

The function add (fragmenti, fragment2, width) combines two fragments of the 
same length and sample width by adding each pair of samples. 

The function reverse (fragment, wi dth) reverses a sound fragment. 

The function mul (fragment, width,factor) multiplies each sample in fragment by 
factor, truncating any overflow. This has the effect of making the sound louder or 
softer. 
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The function bias(fragtnent,width,bias) adds bias to each sample in fragment 
and returns the resuit. 

You can speed up or slow down a fragment by calling 

ratecvC fragment,width,channels,inrate,outrate,state[,weightA[, 
wei ghtB ] ]). Here, inrate and outrate are the frame rates of the input and output 
fragments; what is important is the ratio between inrate and outrate. The parameter 
state represents the internal state of the converter. ratecv returns a tuple of the 
form (fragment,newstate), where the value newstate should be passed in as state for 
the next call to ratecv. You can pass None for state in your first call. Finally, the val- 
ues wei ghtA and wei ghtB are used for a simple audio filter; weightA (which must 
be at least 1) is a weight for the current sample, and weightB (which must be at 
least 0) is a weight for the previous sample. 

For example, the following code reads an audio file and slows it down to half-speed: 

>>> WavFile=wave.open("greenl.wav","rb") 

>>> Params=WavFi1 e.getparams() 

>>> Data=WavFi1 e.readframes(Params[3]) # Params[3]=framecount 
>>> # outrate=2*inrate; twice as many frames per second means 
>>> # the sound is half as fast: 

>>> NewData=audioop.ratecv(Data,Params[1],Params[0],1,2,None) 

>>> NewFile=wave.open("green2.wav","wb") 

>>> (NewData,State)=audioop.ratecv( 

Data,Params[l],Params[0],1,2,None) 

>>> NewFi1 e.setparams(Params) 

>>> NewFi1 e.wri teframes(NewData ) 

>>> NewFi1 e.close() 

>>> winsound.PlaySound("green2.wav",winsound.SND_FILENAME) 


Summary 

Sound can be stored in many file formats. Python’s Standard libraries can read and 
write most sound files, and perform low-level manipulation of audio data. They also 
enable you to play sound on many operating Systems. In this chapter, you: 

-f Played a musical scale on a PC speaker. 

-f Parsed sound files in various formats, and stored sounds in reverse. 

-f Manipulated raw audio data. 

In the next chapter, you learn how to create and manage multiple threads in your 
Python programs. 
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T his chapter describes the modules that help you work 
with graphics files. Python comes with modules that help 
you Identify image file types, convert between different color 
Systems, and handle raw image data. 


Image Basies 

Computer images are made up of a group of pixels, or picture 
elements, and an image’s size is usually specified by its width 
and height in pixels. 

There is a mind-boggling number of file formats that you can 
use to store images; fortunately, however, a few (such as GIF, 
JPEG, and PNG) are popular enough to be considered Stan¬ 
dard. Some image file formats limit the number of different 
colors you can have in the image (GIFs, for example, can be 
any 256 out of 16,777,217 different colors), and some repre- 
sent each pixel by its index in a palette of color definitions. 

Image file formats store the data in either mw, or uncom- 
pressed, form, or they apply some sort of compression to 
make the date smaller. Compression techniques fall into two 
categories: Lossless compression (as is used by GIF files) 
means that no data is lost and that when a viewer decom- 
presses the image and displays it, it is identlcal to the original. 
Lossy compression (as is used by JPEG files) means some 
detail is thrown away in order to achieve better compression. 

Some image file formats also support transparency, so that if 
you display the image over another image, the parts that were 
marked as transparent leave that part of the original image 
visible. Index-based formats tag a particular color as the 
transparent color (so that pixels having that index value are 
completely transparent), and other formats Include an alpha 
channel that telis the degree of transparency of each pixel. 
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Identifying Image File Types 

The i mghdr module makes an educated guess as to the type of image stored in a 
file: 

>>> itnport itnghdr 

>>> itnghdr.what('c: WtempW jacobSwi ngSl eep. jpg' ) 

’ jpeg' 

It looks at the first few bytes of the header, not the entire file, so it doesn’t guaran- 
tee file integrity, but it does serve to differentiate between valid types. Instead of 
passing in a file name, you can pass in a string that contains the first few bytes of a 
file: 

>>> hdr = open ( ' snake . btnp ' , ' rb' ). read (50) # Read a little 
>>> i mghdr.what('',h=hdr) 

’ bmp' 

Table 25-1 lists the values that the what function returns and the different file types 
that i mghdr recognlzes. 


Table 25-1 

Image Types Recognized by imghdr 

Image Type 

Value Returned 

CompuServe Graphics Interchange 

gif 

JFIF Compliant JPEG 

jpeg 

Windows or OS/2 Bitmap 

bmp 

Portable NetWork Graphics 

png 

SGI Image Library (RGB) 

rgb 

Tagged Image File Format 

tiff 

Portable Bitmap 

pbm 

Portable Pixmap 

ppm 

Portable Graymap 

pgm 

Sun Raster Image 

rast 

XI1 Bitmap 

xbm 
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By adding to inighdr’s tests list of functions, you can have it check for additional 
file types. The module is just testing for known file types; it is not dolng anything 
specific to images. The following example looks for the special prefix at the begin- 
ning of all bytecode-compiled Python (.pyc) files: 

>>> def test_pyc(h,f): 
import itnp 

if h . startswi th (i mp . get_tnagi c ()): 
return 'pyc' 

>>> imghdr.tests.append(test_pyc) 

>>> imghdr.what('1eptolex.pyc') 

'pyc' 

Custom test functions like the one shown in the preceding example take two param- 
eters. The first contains a string of bytes representing either the first few bytes of 
the file (if what was called with a file name) or the string of bytes the user passed in 
to what. If the user called what with a file name, the f parameter is an open filelike 
object positioned just past the read to retrieve the bytes for the h parameter. 


Converting Between Color Systems 

A color System is a model that represents the different colors that exist; color Sys¬ 
tems make it possible to refer to colors numerically. By converting a color to a num- 
ber, things like television signals and computer graphics become possible. Each 
color System has its own set of advantages, and the coi orsys module helps you 
convert colors from one System to another. 

Color Systems 

coi orsys supports conversion between four of the most popular color Systems; 
and in each, a color is represented by a 3-tuple of numbers from 0.0 to 1.0. 

RGB 

If youVe worked with computer graphics, then the RGB or red-green-blue color Sys¬ 
tem is probably somewhat familiar; lt’s the color system used by most computer 
Software and hardware. This model is derived from the tristimulus theory of Vision, 
which States that there are three visual pigments in the cones in the retinas of our 
eyes. When they are stimulated, we perceive color. The pigments are red, green, 
and blue. 
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YIQ 

The YIQ color system is the one used by the National Television System Committee 
(NTSC), the standards body for television signals in the United States. Unlike RGB, 
which has three distinet signals, TVs have only a single composite signal. To make 
matters more complicated, the same signal must work with both black-and-white 
and color televisions sets. The Y component in a YIQ color is the brightness 
(luminance) of the color. It is the only component used by black-and-white televi¬ 
sions, and is given the overwhelming majority of the TV signal bandwidth. The 
1 component contains orange-cyan hue Information, which provides the coloring 
used in flesh tones. The Q component has green-magenta hue Information, and is 
given the least amount of signal bandwidth. 

HLS 

For people, the HLS, or hue-lightness-saturation, color system is more intuitive than 
RGB because you can specify a color by first chooslng a pure hue (such as pure 
green) and then adding different amounts of black and white to produce tints, 
tones, and shades. The L component is the lightness, where 1.0 is white and 0.0 is 
black. S is the saturation level of the hue; 1.0 is fully saturated (the pure hue), 
whereas 0.0 is completely unsaturated, giving you just a shade of gray. 

HSV 

The HSV, or hue-saturation-value, system is very close to the HLS model except that 
the pure hues have a V (corresponding to L in HLS) component value of 0.5. 

Converting from one system to another 

coi orsys contains functions for converting from RGB to any of the other Systems, 
and from any of the others to RGB: 

>>> import colorsys 

>>> coiorsys.hls_to_rgb(0.167,0.5,1.0) # Yellow 
(0.998, 1.0, 0.0) 

To convert from HLS to YIQ, for example, you use a two-step process — converting 
first to RGB and then from RGB to YIQ. Of course, If you were planning to do many 
such conversions, it would be worthwhile to wrlte your own function to convert 
dlrectly between the two. 

Although these routines use color parameters in the range from 0.0 to 1.0, it’s also 
common to see each parameter specified uslng an integer range from 0 to 255 (the 
values that fit in a single byte of memory). To convert to that format, just multiply 
each component by 255. This format reduces the number of unique colors you can 
specify (down to around 16.8 million), but don’t worry: the human eye can’t really 
distinguish between more than about 83,000 anyway. 
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Listing 25-1 is a color choosing utility. You choose a color using the HLS color Sys¬ 
tem and it shows that color along with its RGB equivalent, as shown in Figure 25-1. 


Listing 25-1 : choosecolor.py —A HLS-to-RGB color converter 


frotn Tkinter import * 
import colorsys 

def update(*args): 

'Get the scale values and change the canvas color' 

r,g,b = coiorsys.hls_to_rgb(h.get()/255.0, 

1.get()/255.0,s.get()/255.0) 

r,g,b = r*255,g*255,b*255 

rgb.conf i gure(text='RGB:(%d,%d,%d)' % (r,g,b)) 
c.confi gure(bg='#%02X%02X%02X' % (r,g,b)) 

# Create a window with 3 scales and a canvas 

root = Tk() 

hue = Label(root,text='Hue' ) 

hue.gr i d(row=0,colutnn=0) 

light = Label (root,text='Lightness ’ ) 

Iight.grid(row=0,colutnn = l) 

sat = Label (root,text=’Saturation ' ) 

sat. gri d( row=0, coi utnn=2) 

rgb = Label (root,text='RGB:(0,0,0)') 

rgb.grid(row=0,colutnn=3) 

h = Seal e (root, f rotn_=255, to=0 , cotntnand=update) 
h.grid(row=l,colunin=0) 

1 = Seal e (root, f rorri_=255, to=0 , cotTirriand=update) 

1 . gri d( row=l, coi urrin = l) 

s = Seal e (root, f rorri_=255, to=0 , cotTirriand=update) 
s.grid(row=l,colunin=2) 

c = Canvas(root,width = 100,height=100,bg='B1ack' ) 
c.grid(row=l,colurrin=3) 

root.mainloop() 
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Figure 25-1: This utility converts colors 
from the HLS system to the RGB system. 


Handiing Raw Image Data 

Python Works well as a general-purpose programming language, and often leaves 
special-purpose functionality up to third-party developers. As such, Python’s built- 
in support for handiing raw image data is meager at best. 

The i tnageop module manipulates raw image data that you pass it as a Python 
string of bytes. The data must be either 8-bit (each pixel is represented by one char¬ 
acter in the string) or 32-blt (4 characters per pixel; each group of 4 characters rep- 
resents red, green, blue, and alpha or transparency components for that pixel). How 
you go about obtalnlng data in that format is up to you, but if you’re on an SGI com¬ 
puter, you can use the i tngf i 1 e module. In addition, if you have an SGI RGB file, you 
can load it using the rgbi mg module, and then pass its contents to i tnageop. 

i tnageop has a few functions for cropping and scaling images, but the bulk of its 
functions have to do with converting between grayscale images of different color 
depths (for example, converting from a 2-bit grayscale image to an 8-bit grayscale 
image). 

For real image Processing, see the next section for Information about available 
third-party modules. 


Using the Python Imaging Library 

If you plan to do a lot of image Processing, check out the Python Imaging Library 
(PIL) from Pythonware (www.pythonware.com). It is free for both private and com- 
mercial use, and Pythonware also has commercial support pians available. It’s pain- 
less to install and is well worth the download. 

PIL is fast, and its wide range of features enables you to perform a number of image 
Processing tasks, including converting between different file formats; Processing 
images (cropping, resizing, and so forth); annotating existing images with text; and 
creatlng new images from scratch with its drawing functions. 
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Tip 


The next few sections show you how to get started with PIL; consuit its online doc- 
umentation for even more features. 


Visit the Graphics section intheVaults of Parnassus (www .vex.net/parnassus/) 
for plenty of other graphics and image processing Utilities. 


Retrieving image Information 

The main module in PIL is Image, and you use it to open and create images: 

>>> import Image 

>>> i = Image.open('shadowtest.bmp' ) 

>>> i .mode 
'RGB' 

>>> i .size 
(320, 240) 

>>> i.format 
'BMP' 

>>> i .getbands() 

('R', 'G', 'B') 

>>> i.show() # Displays the image 


An image’s mode specifies its color depth and storage; some of the common values 
are listed in Table 25-2. 


Table 25-2 

PIL Mode Values 

Mode 

Descriptiori 

1 

1-bit pixels, black and white 

L 

8-bit pixels, black and white 

P 

8-bit pixels, using a 256-color palette 

RGB 

3 bytes per pixel, true color 

RGBA 

4 bytes per pixel, true color with alpha (transparency) band 

I 

32-bit integer pixels 

F 

32-bit floating point pixels 


Images have one or more bands, or components, of data. For example, each pixel in 
an RGB image has a red, green, and blue component; that image is said to have 
three bands. 
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One nice feature about PIL is that it waits to read and decode file data until it really 
needs to. This means, for example, that you can open an enormous image and read 
its size and type information very quickly. 

Copying and converting images 

The copy ( ) method returns a new image object identical to the old one, so that 
you can make changes without modifying the original. 

convert(mode) returns a new image in the given mode (there are also variations 
on this method that let you pass in a palette or even a conversion matrix). The fol- 
lowing example loads a full-color JPEG image, converts it to a 1-bit black-and-white 
image, and displays it as shown in Figure 25-2: 

>>> img = Image.open('binky.jpg ' ) 

>>> img.showf) # Show the original 

>>> img.convert('1' ).show() # Show the new version 

The save(filename) method writes the contents of the current image to a file. PIL 
looks at the extension you give the file name, and converts it to the appropriate for¬ 
mat. For example, if you have an image file named test. j pg, you can convert it to 
a GIF as follows: 

>>> Image.open('test.jpg').save('test.gif') 

Because JPEG files are true color, but GIF uses a 256-color palette, PIL takes care of 
the necessary conversion as it saves the file. 

As mentioned earlier, PIL waits as long as possible before loading and decoding file 
data, so even if you open an image, its pixel data isn’t read until you display it or 
apply some conversion. Therefore, you can use the draftfmode, (w,h)) method 
to instruet the image loader to convert the image as it is loaded. For example, if you 
have a huge 5,000 x 5,000-pixel, full-color image and you only want to work on a 
smaller, 256-color copy of it, you can use something like the following: 

img = Image.open('huge.jpg').draft('P',(250,250)) 


An image’s s h ow () method is a debugging facility that saves the image to a tempo- 
rary file and launches the default viewer for that file type. 

Sometimes the show() command has trouble working from inside IDEs such as 
PythonWin. 
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Figure 25-2: A true color image (on the left) and a 1-bit version (on the right) after using 
the converte ) method 


Using PIL with Tkinter 

The ImageTk module provides two classes, Bi tmaplmage and Photoimage, that 
create Tkinter-compatible bitmaps and images that can be used anywhere Tkinter 
expects a bitmap (black-and-white image) or image (color image). Not only can you 
then use PlL’s image Processing features in any Tkinter program, you can also use it 
to load image formats that Tkinter doesnT understand. 

Refer to Chapters 19 and 20 for coverage of Tkinter. 




PIL also has functions for creating a Windows-compatible bitmap (DIB) that can 
be drawn into a Windows device context, and functionality for writing images out 
to PostScript files or printers. 
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Cropping and resizing images 

The crop( (1 eft, top, right, bottom)) method returns a rectangle portion of an 
image. 

resize((w, h)[, fi Iter) returns a resized copy of an image. The fi 1 ter argu- 
ment Controls what sort of sampling is used against the original image, and can be 
one of BILINEAR, BICUBIC, or NEAREST (the default). 

One other useful method is thumbnai 1 ((w, h )), which resizes the object in place 
while maintaining the original aspect (width-to-height) ratio. Because of this, it may 
not use the exact size you pass in. 

Modifying pixei data 

You can access and change the value of any image pixei by its (x,y) coordinates, 
with (0,0) being the upper-left corner. Like Python slices, coordinates refer to the 
spaces between pixels, so a rectangle with its upper-left and lower-right corners at 
(0,0) and (20,10) would be 20 pixels wide and 10 tali. 

The getpi xel ((x, y)) and putpixel((x, y), value) methods get and set indi- 
vidual pixels, where val ue is in the appropriate form for the image’s mode. The fol- 
lowing code opens an image, paints a black band across it, and displays the results 
(shown in Figure 25-3): 

>>> i = Image.open('shadowtest.bmp' ) 

>>> i.getpixel((10,25)) 

(156, 111, 56) 

>>> for y in xrange(50,60): 

for X in xrange ( i.size[0] ): 
i.putpixel((x,y), (0,0,0)) 

>>> i.show() 



Figure 25-3: Use getpi xel and putpixel 
to operate on individual pixei values. 








Chapter 25 -4- Processing Images 477 


getdata () returns a list of tuples representing each pixel in the image, and 
putdata (data , [ , scal e] [ , offset] ) places a sequence of tuples into the image 
(the offset defaults to the beginning of the image, and the scal e defaults to 1.0). 

PIL’s ImageDraw module provides Draw objects that let you draw shapes and text 
on an image. The following example displays the image shown in Figure 25-4: 

>>> itnport Image, ImageDraw 

>>> from whrandom import randrange 

>>> img = Image.open('happy.jpg') 

>>> draw = ImageDraw.Draw(img) 

>>> points = [] # Create a list of random points 
>>> for i in xrange(lO): 

points.append((randrange(img.si ze[0]), #x 
randrange(img.size[l]))) # y 

>>> draw.1 i ne(points) 

>>> img.show() 



Figure 25-4: The ImageDraw module lets you draw 
shapes and text on images. 


Listing 25-2 takes the current time and creates a GIF file containing an analog clock 
face image, as shown in Figure 25-5. On-the-fly image generation is often useful in 
creatlng dynamic content for Web pages (and If there’s anything the world needs, 
it’s yet another time dlsplay on a Web page). 
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Listing 25-2: clockgif.py — Generates a clock 
face showing the current time 


import time,Image,ImageDraw 

def centerbox(rmax,perc): 

'Returns a coordinate box perc % of rmax' 

sub = rmax*perc/100.0 

return (rmax-sub,rmax-sub,rmax+sub,rmax+sub) 
r = 100 # clock face radius 

img = Image.new('RGB’,(r*2,r*2),coi or=( 128,128,128)) 
draw = ImageDraw.Draw(img) 

# Make the clock body 

draw.pieslice(centerbox(r,100),0,360,fill=(0,0,0)) 

draw.pieslice(centerbox(r,98),0,360,fill=(80,80,255)) 

draw.pieslice(centerbox(r,94),0,360,fill=(0,0,0)) 

draw.pieslice(centerbox(r,93),0,360,fill=(255,255,255)) 

# Draw the tick marks 

for i in range(12): 
deg = i * 30 

draw.pieslice(centerbox(r,90),deg-l,deg+l,fill=(0,0,0)) 

draw.pieslice(centerbox(r,75),0,360,fill=(255,255,255)) 

# Get the current time 

now = time.1ocaltime(time.time()) 
hour = now[3] % 12 
minute = now[4] 

# Draw the hands 

hdeg = hour * 30 + minute / 2 
mdeg = minute * 6 

draw.pieslice(centerbox(r,50),hdeg-4,hdeg+4,fill=(100,100,100)) 
draw.pieslice(centerbox(r,85),mdeg-2,mdeg+2,fill=(100,100,100)) 

#img.rotate(90).showC) # For debugging 

img.rotate(90).save('currenttime.gif') 
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Figure 25-5: With PIL, it's easy 
to create on-the-fly images. 


As you may have noticed, this example makes heavy use of Draw’s 

pieslice((1eft, top, right, bottom, startangle, stopangle[, 

out 1 i ne ]) method. In order to make it easy to use a different size clock, ali mea- 

surements are calculated as percentages of the radius (therefore, changing the 

value of r is all you need to do). The centerbox function is a helper function that 

returns a square enclosing a circle of the right size. 

One other thing to notice is that an angle of zero is directly to the right of center, 
and angle measurements are clockwise from there. Instead of working around that 
in the calculations for the placement of the clock hands, it was easier to just draw 
them as if the clock were on its side, and then rotate the entire image by 90 degrees 
(note that image rotation degrees are counterclockwise). 

The following list contains the more common methods of a Draw object: 

setink(ink) 
setfi 11(onoff) 
setfont(font) 

arc((xl, yl, x2, yZ), start, end[, fili]) 
bitmapf(x, y), bitmapf, fili]) 

chord((xl, yl, x2, y2), start, end[, fill][, outline]) 
ellipse((xl, yl, x2, y2)[, fill][, outline]) 
line((x, y)[, fili]) 

pieslice((xl, yl, x2, y2), start, end[, fill][, outline]) 

point( (x, y)[, fili]) 

polygon((x, y)[, fill][, outline]) 

rectangle((xl, yl, x2, y2)[, fill][, outline]) 

text((x, y), text[, fill][, font][, anchor]) 
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Other PIL features 

New versions of PIL continue to add powerful new features; check the Pythonware 
Web site (www. pythonware. com) for new versions and more documentation. Other 
interesting PIL modules and features include: 

ImageEnhance — Contains classes for adjusting the color, brightness, con- 
trast, and sharpness of an image 

♦ ImageChops —Provides arithmetic image operations (adding and subtracting 
images) as well as functions for lightening, darkening, and inverting images 

♦ Support for creating animated (multiframe) GIF and FLI/FLC images 

♦ Transformations, including rotating at arbitrary angles and applying a Python 
function to each pixei 

♦ Image filters for biurring images or finding edges 

♦ The capability to add your own decoders for new image types 

Summary 

Python offers helpfui support for processing images, such as modules for identify- 
ing image file types. In this chapter, you: 

Learned about the Information commonly stored in image files. 

-f Identified file types using the i mghdr module. 

♦ Converted colors between different color Systems such as RGB and HLS. 

♦ Modified images using the Python Imaging Library. 

The next chapter shows you how to create multithreaded applications so that your 
programs can work on more than one task at a time. 

> > -f 


Multithreading 



R unning several threads is similar to running several 

different programs concurrently, but with the following 
benefits: 

Threads can easily share data, so writing threads that 
cooperate is simpler than making different programs 
work together. 

Threads do not require much memory overhead; they 
are cheaper than processes. (In the UNIX world, threads 
are often called light-weight processes^ 


Understanding Threads 

Threads are useful in many situations where your program 
needs to perform several tasks that aren’t necessarily interde- 
pendent. Programs with a GUI, for example, often use two 
threads: one to handle user interface jobs such as repainting 
the window, and one to handle the “heavy lifting,” such as 
talking to a database. Other times, threads are useful because 
it’s more logical to divide work into distinet parts. For exam¬ 
ple, a game might have a separate thread for each computer- 
controlled player or object. 

A thread may be interrupted by another thread at any time. 
After any line of code, the Python interpreter may switch to 
another thread. 

/Note Some programmers call this interruption timesiicing. 

' However, strictiy speaking, timesiicing refers to the vaguely 

Communist notion of giving every thread equal amounts of 
CPU time. 

The interpreter checks for thread switching once every few 
bytecode instructions; sys . setchecki nterval (which 
defaults to 10) is the number of bytecodes between switches. 


> ♦ ♦ ♦ 

In This Chapter 

Understanding 

threads 

Spawning, tracking, 
and killing threads 

Avoiding concurrency 
issues 

Preventing deadiock 
Example: 

downloading from 
multiple URLs 

Porting threaded 
code 

Weaving threads 
together with Queue 

♦ ♦ ♦ ♦ 
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The switching is transparent to you, and exactly when the switch happens is up to 
the Python interpreter, the operating system, and the phase of the moon: In a multi- 
threaded program, the order of execution may change from one run to the next. 

This unpredictability is the reason why multithreading can be trickier than single- 
threaded programming: A buggy program might work nine times out of ten, and 
then crash the tenth time because the order of execution was different. 

In general, you create all threads (other than the main thread) yourseif. However, a 
C extension may create dummy threads to do its work. Taiking to these threads from 
Python is difficult, so be forewarned if you want to communicate with dummy 
threads. A long-running calculation in an extension module effectively counts as 
one instruction, so be aware that other threads may have to wait a while for a 
dummy thread to take its turn! 


Spawning, Tracking, and Killing Threads 

Python features two multithreading modules, thread and threading. The modules 
overlap enough that you can choose the one you like best and use it exclusively. 
threading is a high-Ievel module that calls thread for lower-Ievel operations. 
threading includes a T h r e a d class similar to Java’s thread class, so it is a good 
choice for Java veterans. We included two versions of this chapter’s example — one 
using thread and one using threadi ng—to illustrate the workings of both. 

Creating threads with the thread module 

To spawn another thread, call 

start_new_thread (.function, argsl, kwargs~\) 

The function call returns immediately and the child thread starts and calls functiorr, 
when function returns, the thread terminates. function can be an object method. args 
is a tuple of arguments; use an empty tuple to call function without passing any 
arguments. kwargs is an optional dictionary of keyword arguments. 

Here are two ways of starting a new thread: 

thread.start_new_thread(NewThread.run , ()) 

thread.start_new_thread(CalcDigitsOfPi,(StartDigit,NumDigits)) 

Each thread has an ID number, which you can see by calling thread. get_i dent(). 
The ID is unique at any given time, but if a thread dies, a new thread may re-use 
its ID. 

If several threads print log messages, it can become hard to determine which 
thread said what; something like a party where everyone talks at once. The follow- 
ing example uses thread identifiers to indicate which thread is taiking: 
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def PrintDebugMessage(ThreadNameDict,Message): 

CurrentThreadID = thread.get_ident() 

# Look up the thread name in the name dictionary. If 
there is no name entry for this ID, use the ID. 

CurrentThreadName = ThreadNameDict.get(\ 

CurrentThreadID, 'CurrentThreadID') 
print CurrentThreadName,Message 

A thread terminales when its target function terminates, when it calls 
thread.exit(),or when an unhandled SystemExi t exception is raised. 

Python raises the exception thread.error ifa threading error occurs. 

Starting and stopping threads with the threading 
module 

threading detines a Thread class to handle threads. To spawn a new thread, you 
first create a Thread object and then call its startQ method. startO creates the 
actual thread and starts the target function; you should call start only once per 
thread. 

The Thread constructor takes several keyword arguments, all of which are 
optional: 

target — Function to call when you startC) the thread. Defaults to None. You 
should pass a value for target unless you override the run () method of 
Thread in a subclass; otherwise, your thread will not do anything. 

name — String name of this thread. The default name is of the form “Thread- 
n,” where n is a small decimal number. 

-f args—A tuple of arguments to pass to the target function. Empty by default. 

-f kwargs — Keyword argument dictionary to pass to the target function. Empty 
by default. 

-f group — Currently unused. In the future, it will designate a thread group. 

This code uses a Thread object to run the function CalcDigitsOfPi in a new thread: 

PiThread = \ 

Thread(target=CalcDigitsOfPi,args=(StartDigit,NumDigits)) 

PiThread.start() 

You can create a subclass of the Thread class, and override the run () method to 
do what you want. This is a good approach if you are tracking thread-specific data. 

You should not override methods other than_i ni t_() and run (). If you 

override_i n i t_, you should call the_i n i t_method of the parent class in 

your constructor: 
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class SearchThread(threading.Thread): 

def _i ni t_(sel f): 

threading.Thread._i nit_(sel f) 

# Now carry on constructing... 

self.Matches={) 

Threads can be flagged as daemon threads. The main thread (and therefore, your 
Python program) keeps running as long as there are non-daemon threads running; if 
only daemonie threads are running, the script exits. You set daemon status with 
setDaetTion(hoo/ean) and check it with i sDaemon (). You must set a thread’s dae¬ 
mon status before calling start(). Child threads inherit their daemonie status 
from their parents. 

/Note In a programming context, a daemon is a process or thread that silently handies 

' some ongoing, invisible task. If you should encounter a daemon outside a pro¬ 

gramming context, we recommend you avoid signing anything. 

Thread status and Information under threading 

You can check whether a Thread object is running with the method i sAl i ve (). 
isAl i ve( ) returns true if someone has called stant (), and the run( ) method has 
not returned yet. 

Each thread has a name, an arbitrary string which you can access by getName( ) 
and setHame(newName). 

Finding threads under threading 

threadi ng .enumerate () returns a list of all active Thread objects. This includes 
dummy threads, daemon threads, and the main thread. Because the list contains 
the main thread, it is never empty. threading.activeCountC ) returns thenumber 
of active threads; this number is equal to the length of the list returned by th read - 
ing.enumerate(). 

A call to threadi ng. currentThread( ) returns an object corresponding to the cur¬ 
rent thread of control. (Even the main thread has a corresponding Thread object.) 

Waiting for a thread to finish 

To wait for a Thread to terminate, call its j oi n method. j oi n blocks until the 
thread dies; it returns immediately if the thread has not started, or has already 
terminated. 

For example, the following lines of code (exeeuted in the main thread) wait until all 
the other currently active threads have terminated: 

ThisThread = threading.currentThread() 
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whi1 e (threading.activeCount()>1): 

CurrentActiveThreads = threading.enumeratei) 
for WaitThread in CurrentActiveThreads: 

# Don't wait for myself: 
if (WaitThread != ThisThread): 

WaitThread.join() 

# Now all those threads have finished. We are now the 

# only running thread. unless someone spawned new threads 

# while we were waiting. If that happened, we make 

# another pass through the while loop. (If that can't 

# happen, the whiling is superfluous) 


Avoiding Concurrency Issues 

“Oh, what a tangled web we weave, when first we practice to...implement multi- 
threaded database access. ” 

— Sir Walter Scott, as reinterpreted by an unnamed programmer at 2 a.m. 

Imagine a chalkboard on wbich three professors are eacb writing some information. 
The professors are so deep in thougbt that they are blissfully unaware of one 
another’s presence. Each professor erases some chalk marks, writes on the board, 
pauses to think, and then continues. In the process, the professors erase bits of 
each other’s writings, and the resulting blackboard is a mess of unrelated word 
salad. 

Threads run in the same address space, so they can access the same data. 
Unfortunately, threads can also break other thread’s data if they do not cooperate. 
The professors and the chalkboard illustrate what can go wrong. The phrase concur¬ 
rency issues is a catchall term for all the ways in which two threads, working 
together, may put data into an unusable form. A program or object free of concur¬ 
rency issues is called thread-safe. 

Returning to the chalkboard example: Everytbing wouid have been fine if the pro¬ 
fessors had taken turns, and the second professor had waited untii the first was 
done with the chalkboard. You can solve most concurrency issues by restricting 
data access to one thread at a time. A lock, or mutex (from “mutually exclusive”), is 
the way you make your threads take turns. A lock has two States: acquired and 
released. A thread must acquire the lock before it is allowed to access the data. 
When the thread is done, it releases the lock. If a lock has been acquired, other 
threads must wait, or block, untii the lock is released before tbey can acquire it. 


Locking with thread 

To create a new lock, call thread .all ocate_l ock(), which returns a lock object. 
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To acquire the lock, call the method aequi re({waitflag\). Call aequi re( 1) to wait 
for the lock as long as necessary. Call aequi re (0) to return immediately if the lock 
isn’t available. If you pass avalue for waitflag, aequi re() returns 1 if it acquired 
the lock, 0 if it didn’t. If you don’t pass a waitflag, aequi re () waits for the lock, and 
returns None. 

This code snippet uses locks to access some data in a thread-safe way: 

# Acquire the employee lock. Block until acquired. 

Empl oyeeLoek. acquired) 

# Try to acquire the Company lock, and return immediately. 
if (CompanyLock.aequire(0)): 

# Do stuff with the company object, then release it 

el se: 

# Don't do stuff with the company object, because 

# you don't have its lock! 

To release a lock, call its rei ea se () method. You can release a lock that was 
acquired by another thread. If you release a lock that isn’t currently acquired, the 
exception thread.error is raised. 

You can check whether a lock is acquired with the 1 oe ked () method. When first 
created, the object is in its released state. 

Use aequi re (0), and not just a call to 1 ocked() if you don’t want your code to 
wait for a lock. For example, the following code may block if another thread grabs 
the lock between our call to 1 oeked () and our call to aequi re (): 

if (not RecordLock.1oeked()): 

RecordLock.aequire() # We may be here a while! 

Locking with threading 

The threadi ng module offers several flavors of concurrency control. The Lock 
class is a simple wrapper for the lock class of the thread module; most of the other 
concurrency-control classes are variations on the Lock class. 

Lock - simple locking 

When you create a Lock object, it starts in the released state. The Lock object has 
two methods: a c q u i r e () and r e 1 e a s e (). These methods are wrappers for the 
a c q u i r e () and r e 1 e a s e () methods in the thread module; see the previous sec- 
tions for details. 

RLock- reentrant locking 

RLock (“reentrant lock”) is a variation on Lock, and its aequi re () and rei ease () 
methods have the same syntax. An RLock may be acquired multiple times by the 
same thread. The RLock keeps track of how many times it has been acquired. Other 
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threads cannot acquire the R Lo c k until the owning thread has called re 1 e a s e () 
once for each call toacquire(). AnRLock must be released by the same thread 
that acquired it. 


Semaphore - n-at-a-time locking 

A semaphore is also a kind of lock. The Semaphore class has a c q u i r e () and 
rei ea se ( ) methods with the same syntax as Lock. However, whereas a lock 
restricts access to one thread at a time, a semaphore may permit access by several 
threads at a time. A semaphore keeps an internal counter of “available slots.” 
Releasing the semaphore increases the counter; acquiring the semaphore 
decreases the counter. The counter is not allowed to go below zero. If the counter is 
at zero, no thread can acquire the semaphore until it has been released at least 
once, and so threads that try to acquire it will block until the semaphore has an 
available slot. Passing an Integer to the Semaphore constructor gives the counter 
an initial value; it defaults to 1. 

For example, assume several threads want to call a function that is memory-intensive. 
More than one thread can call it at once, but if too many calls happen at once, the 
System will run out of physical memory and slow down. You could limit the number 
of simultaneous calls with a semaphore: 

# Create a semaphore, for later use: 

MemorySemaphore = Semaphore(MAXIMUM_CALLERS) 

# This is a safe wrapper for the function MemoryHog: 

def SafeMemoryHog(): 

MemorySemaphore.aequi re(l) 

MemoryHog!) 

MemorySemaphore.reiease() 


Event-simple messages between threads 

An event lets one thread block until triggered by another. The Event class has an 
internal true/false flag that is initially false. This flag is similar to a traffic light, 
where false means stop and true means go. You can check the flag’s value with the 
i sSet () method, set it to true with set ( ), and set it to false with cl ear (). Calllng 
ciear!) is like a stop slgn to other threads; calling set !) is a go slgn. 

You can make a thread wait until the flag is true. Call the event’s wa i t!) method to 
block until the evenfs flag is set. You can pass a number to wa i t!), indicating a 
timeout. For example: If the flag is not set within 2.5 seconds, a call to wai t! 2 . 5 ) 
will return anyway. 

For example, this code snippet is part of a script that munges data. The munging 
can be stopped and started by setting the global Event object, MungeEvent: 

def StopMunging!): 

MungeEvent.clear!) # stop! 
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def StartMunging(): 

MungeEvent.set() # go! 

def MungeData(): 
whi1 e (1): 

MungeEvent .wait() # wait unti1 we get the green light 

MungeOneRecord() 


Condition - wrapper for a lock 

The Condition class wraps a lock. You can pass a Lock object to the Condition 
constructor; if not, it creates an RLock internally. A condition object has aequi re () 
and rei ease () methods, which wrap those of the underlying lock object. 

Condition objects have other methods: wai t( lti'meout~\), noti fy (), and 
noti fyAl 1 (). Athread should aequi re the lock before calling these methods. 

A call to the wai t ([timeout] ) method immediately releases the lock, blocks until 
notification is received (when another thread calls noti fy () or noti fyAl 1 ()), 
acquires the lock again, and then returns. If you supply a value for timeout, the call 
will return after that many seconds have passed, whether or not it was notified or 
reacquired the lock. 

You call noti fy to wake up other threads that have called wait on the condition. If 
there are waiting threads, noti fy awakens at least one of them. (Currently, noti fy 
never awakens more than one, but this is not guaranteed for future versions.) 
noti fyAl 1 wakes up all the waiting threads. 


A Word of warning 

When threads share data, examine every piece of data to ensure that thread inter- 
action can’t put the data into an invalid state. A program that is not thread-safe may 
work for months, waiting for a dramatic time to fail. Eternal vigilance is the price of 
multithreading! 


Preventing Deadiock 

Assume one thread acquires a lock, but hangs without releasing it. Now, no other 
threads can acquire that lock. If another thread waits for the lock (by calling the 
aequi re ( 1 ) method), it will be frozen, waiting for the lock forever. This state is 
called deadiock. Deadiock is not as sneaky a bug as some concurrency issues, but 
it’s definitely not good! 

The section of code between acquiring a lock and releasing it is called a critical sec- 
tion. To guard against deadiock, there should be only one code path into the critical 
section, and only one way out. The critical section should be as short as possible, 
to prevent deadiock bugs from creeping in. In addition, the critical section should 
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execute as quickly as possible — other threads may be waiting for the lock, and if 
they spend a long time waiting, notbing happens in parallel and tbe benefits of mul- 
titbreading evaporate. 

It is generally good practice to put a t ry . . . f i n a 11 y clause in a critical section, 
where the f i nal ly clause releases the lock. For example, here is a short function 
that tags a string with the current thread ID and writes the string to a file: 

def LogThreadMessageC LogFi 1 e,Message): 

LogFileLock.acquire(l) 
try: 

LogFi1 e.write('thread.get_ident()'+Message+"\n") 
f i n a 11 y : 

# If we do not release the lock, then another thread 

# might wait forever to acquire it: 

LogFi1eLock.reiease() 


Example: Downloading from Multiple URLs 

The following code is a more complex example of some of the features covered in 
this chapter. The script retrieves files from a list of target URLs. It spawns several 
threads, each of which retrieves one file at once. Multithreading makes the whole 
process faster, because thread A can receive data while thread B is waiting for a 
response from the server. 

Cross- A See "Multitasking Without Threads" in Chapter 13 for an alternate solution to this 
Referenc^ problem usingthe asyncore and select modules. 

We wrote two versions of the URLGrabber script — one using thethreading mod¬ 
ule (see Listing 26-1), and one using thread (see Listing 26-2). Most of the code is 
the same; unique code is bolded in the source listing. 


Listing 26-1 : URLGrabber —threading version 


# URLGrabber retrieves a list of target files from the network to 

# the local disk. Each file has a particular URL, indicating where 

# and how we should request it. 

# 

# This version of URLGrabber uses the threading module. It uses 

# a Lock object to limit access to the URLList. (Without this lock. 

# two threads might both grab the same URL) 

import threading 

import urllib 
import urlparse 
import os 


Continued 
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Listing 26-1 (continued) 


import sys 
itnport traceback 

# The optitnal nuttiber of worker threads depends on one's bandwidth 
WORKER_COUNT = 4 

# Default filenattie of our uri list 
URL_FILE_NAME = "MyURLList.txt" 

. A WorkerThread downloads files from URLs to disk. Each 

WorkerThread object corresponds to a single flow of control . 

WorkerThread is a subclass of Thread. overriding the runO 

and _init_() methods. (Thread's other methods should 

not be overridden!) """ 
class WorkerThread(threading.Thread): 

def_init_(self. URLList. URLListLock): 

# Call the Thread constructor (Python subclasses do *not* 

# automatically invoke parent class constructors) 

threading.Thread._i nit_(self) 

# Cache references... 
self.URLList=URLList 

self .URLListLock = URLListLock 

. Acquire the URLList lock, grab the first URL from the list, 

cross the URL off the list, and release the URLList lock. 

(This code could be part of run(), but it's good to put critical 

sections in their own function) . 

def GrabNextURL(self): 

self.URLLiStLock.aequire(1) # 1 means we block 
if (1en(self.URLList)<1): 

NextURL=None 
el se: 

NextURL=self.URLListEO] 
dei self.URLList[0] 
sel f. URLLi stLock. rei ea se () 
return NextURL 

""" We override Thread's runO method with one that does 

what we want. Namely: Take URLs from the list, and retrieve them. 

When we run out of URLs, the function returns, and the thread dies. """ 

def run(self): 
whi1 e (1); 

NextURL = self.GrabNextURLO 
if (NextURL==None): 

# The URL list is empty! Exit the loop. 
break 

try: 

self.RetrieveURL(NextURL) 
except: 

self.LogError(NextURL) 
def RetrieveURL(self.NextURL): 

# urlparse splits a URL into pieces; 

# piece #2 is the file path 




Chapter 26 > Multithreading 491 


FilePath=urlparse.urlparse(NextURL)[2] 

FilePath = FilePath[l:] # strip leading slash 

# If file name is blank, invent a name 
if (Fi 1ePath==""): Fi 1ePath="index.html" 

# Strip trailing newline, if we have one 

if (FilePath[-l]=="\n"): Fi 1ePath=Fi1ePath[:-1] 

# Create subdi rectori es as necessary. 

(Di rectory, Fi 1 eNatrie)=os .path.split(FilePath) 
try: 

os.makedirs(Directory) # make directories as needed 
except: 

# os.makedirs raises an exception if the directory exists. 

# We ignore the exception. 
pass 

LocalPath = os.path.normpath(Fi 1ePath) 
uri 1 ib.uriretrieve(NextURL,Fi 1ePath) 
def LogError(self,URL): 

print "Error retrieving URL:",URL 

# Quick-and-dirty error logging: This code prints the 

# stack-trace that you see normally when an unhandled 

# exception crashes your script. 

(ErrorType,ErrorValue,ErrorTB)=sys.exc_info() 
print "\n\n***ERR0R:" 
print sys.exc_info() 
traceback.print_exc(ErrorTB) 

# Main function 
if _name_ == '_main_ 

# Open the URL-list file. Take the first command-line 

# argument, or just use the hard-coded name. 
if (len(sys.argv)>=2): 

URLFileName = sys.argv[l] 
el se: 

URLFi1eName=URL_FILE_NAME 

try: 

URLFile = open(URLFi1eName) 

URLList = URLFi1 e.readlines() 

URLFi 1 e. cl ose() 
except: 

print "Error reading URLs from:",URLFi1eName 
sys . exit() 

# Create some worker threads, and start them running 
URLListLock = threading.Lock() 

WorkerThreadList = [] 

for X in range(0.W0RKER_C0UNT): 

NewThread = WorkerThread(URLList,URLListLock) 
NewThread.setNameC'X') 

WorkerThreadList.appendiNewThread) 

# call startO to spawn a new thread (not runO!) 

NewThread.starti) 

# Wait for each worker in turn, then exit. 

# joinO is the "vulture method" - it waits until the thread dies 
for X in range(0,WORKER_C0UNT): 

WorkerThreadList[X].joinC) 
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Listing 26-2: URLGrabber —thread version 


# URLGrabber retrieves a list of target files from the network to 

# the local disk. Each file has a particular URL, indicating where 

# and how we should request it. Several threads run in parallel, 

# each downloading one file at once. Multithreading makes the 

# whole process faster, since thread A can receive data while 

# thread B is waiting on the server. 

# 

# This version of URLGrabber uses the thread module. It uses 

# a lock to limit access to the URLList. (Without this lock, 

# two threads might both grab the same URL) 

import thread 

itnport uri 1 ib 
import urlparse 
import os 
import sys 
import traceback 

# The optimal number of worker threads depends on one's bandwidth 
W0RKER_C0UNT = 4 

# Default filename of our uri list 
URL_FILE_NAME = "MyURLList.txt" 

. A WorkerThread downloads files from URLs to disk. Each 

WorkerThread object corresponds to a single flow of control. . 

class WorkerThread: 

def_init_(self, URLList. URLListLock): 

# Cache references... 

self.URLList=URLList 

self-URLListLock = URLListLock 

. Acquire the URLList lock, grab the first URL from the list, 

cross the URL off the list, and release the URLList lock. 

(This code could be part of run(), but it's good to put critical 

sections in their own function) . 

def GrabNextURL(self): 

self.URLLiStLock.acquire(l) # 1 means we block 
if (1en(self.URLList)<1): 

NextURL=None 
el se: 

NextURL=self.URLListLO] 
dei self.URLList[0] 
sel f. URLLi stLock. rei ea se () 
return NextURL 

""" runO is the target-function of our worker threads . 

def run(self): 

IncrementThreadCount() 
while (1): 

NextURL = self.GrabNextURLO 
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if (NextURL==None): 

# The URL list is empty! Exit the loop. 
break 

try: 

self.RetrieveURL(NextURL) 
except: 

self.LogError(NextURL) 

DecrementThreadCount() 
def StartFirstWorker(self): 

MainThreadLock.aequire(1) 
self. run() 

def RetrieveURL(self.NextURL): 

# urlparse splits a URL into pieces; 

# piece #2 is the file path 
FilePath=urlparse.urlparse(NextURL)[2] 

Fi 1ePath=Fi1ePath[l: ] # strip leading slash 

# If file name is blank, invent a name 
if (Fi 1ePath==""): Fi 1ePath="index.html" 

# Strip trailing newline, if we have one 

if (FilePath[-l]=="\n"): Fi 1ePath=Fi1ePath[:-1] 

# Create subdi rectori es as necessary. 

(Directory,Fi 1eName)=os.path.split(FilePath) 
try: 

os.makedirs(Directory) # make directories as needed 
except: 

# os .ttiakedi rs raises an exception if the di rectory exists. 

# We ignore the exception. 
pass 

LocalPath = os.path.normpathlFi 1ePath) 
uri 1 ib.uriretrieve(NextURL.Fi 1ePath) 
def LogError(self.URL): 

print "Error retrieving URL:".URL 

# Quick-and-dirty error logging: This code prints the 

# stack-trace that you see normally when an unhandled 

# exception crashes your script. 

(ErrorType.ErrorValue.ErrorTB)=sys.exc_info() 
print "\n\n***ERR0R:" 
print sys.exc_info() 
traceback.print_exc(ErrorTB) 

def DecrementThreadCountC): 

ThreadCountLock.aequire() 
global WorkerThreadCount 
WorkerThreadCount = WorkerThreadCount - 1 
if (WorkerThreadCountkl): 

Mai nThreadLock. rei easeO 
ThreadCountLock.reiease() 

def IncrementThreadCountO: 

ThreadCountLock.aequire() 
global WorkerThreadCount 
WorkerThreadCount = WorkerThreadCount + 1 


Continued 
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Listing 26-2 (continued) 


ThreadCountLock.reiease() 

# Main function 
if _nattie_ == '_main_ 

# Open the URL-list file. Take the first command-line 

# argument, or just use the hard-coded name. 
if (len(sys.argv)>=2): 

URLFi 1 eNattie = sys.argv[l] 
el se: 

URLFi 1 eNatne=URL_FI LE_NAME 

try: 

URLFile = openiURLFi1eName) 

URLList = URLEi1 e.readlines() 

URLEi1 e.close() 
except: 

print "Error reading URLs from:",URLEi1ehame 
sys . exit() 

# Create some worker threads. and start them running 
URLListLock = thread.allocate_lock() 

ThreadCountLock = thread.allocate_lock() 

# We acquire the MainThreadLock. The last worker thread to exit 

# releases the lock. so that we can acquire it again (and exit) 
MainThreadLock = thread.allocate_lock() 

MainThreadLock.aequire() 

WorkerThreadCount = 0 

for X in range(O.WORKER_COUNT): 

NewThread = WorkerThread(URLList,URLListLock) 
thread.start_new_thread(NewThread.run.()) 

# This call will block until the last thread releases the main 

# thread lock in DecrementThreadCountC). 

MainThreadLock.aequire() 


Porting Threaded Code 

Not ali operating Systems include support for multithreading — an OS may multi- 
task without including native thread support. Note some minor differences in how 
threading works on different platforms: 

On most platforms, child threads are immediately killed (without exeeuting 
object destructors or try. . . f i nal 1 y clauses) when the main thread exits. 
However, child threads keep running on SGl IRIX. We recommend terminating 
all threads before the main thread exits anyway, to ensure proper cleanup. 

♦ Signals generally go to the main thread. Therefore, if your script handles sig- 
nals, the main thread should not block. In particular, if you are using Tkinter, 
you should run ma i n 1 oop () from the main thread. However, on platforms 
where the si gna 1 module is not available, signals go to an arbitrary thread. 
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Weaving Threads Together with Queues 

The Queue module delines a thread-safe queue class (Queue.Queue).A queue is 
similar to a list. A queue handles concurrency control internally, which saves you 
the bother of handling it in your code. 

Call the constructor Queue. Queue(maxs ize) to create a queue. If maxsize is 
greater than zero, the queue will be limited to that many elements; otherwise, the 
length of the queue is unlimited. 

To add an item to the queue, call put (1 tem, b / ock). The method getibl ock), 
returns the next item in the queue. Setting block to 1 makes these methods wait 
until they can successfully add or retrieve an item. 

Setting block to 0 causes get and put to return immediately. You can also use the 
synonym methods get_nowai t (i tem) and put_nowai t (i tem). A nonblocking call 
to put raises the exception Queue.Full if the queue is full. (If the queue’s length is 
not limited, put will always succeed.) A nonblocking call to get raises the excep¬ 
tion Queue. Empty if no items are on the queue. 

The Queue class includes some methods to inspect the queue. qsi ze () returns the 
queue length. i sEmpty () returns 1 if the queue is empty, and 0 otherwise. 
i s Fui 1 () returns 1 if the queue is full, and 0 if the queue is empty. Be careful: other 
threads may have touched the queue while you were inspecting it! Therefore, you 
must take the output of these methods with a grain of salt. For instance, the follow- 
ing code may raise a Queue. Empty exception: 

if (not MyQueue.isEmpty()): 

Firstitem = MyQueue.get () # unsafe! 

To safely modify a queue synchronously, use get_nowai t () and put_nowai t (), 
and catch any Fui 1 or Empty exceptions. 

Technical Note: How Simultaneous Is 
Simultaneous? 

A CPU can only handle one flow of control at a time. Computers switch between 
processes quickly, so in a single second, a processor may execute some instruc- 
tions for thread A and thread B. To the user, the threads appear to run at the same 
time. We say that these programs are “simultaneous,” although they are actually 
taking turns. On a multiprocessor machine, threads can be literally simultaneous — 
CPU 1 is running program A at the same instant that CPU 2 is running program B. 
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Currently, Python maintains a global interpreter lock, so that it executes only one 
Python thread at once. The disadvantage here is that a multiprocessor machine can 
devote only one processor to a particular Python program. This limitation doesnT 
matter much (especially if you’re using a single processor machine!), hut you may 
want to work around it for performance reasons. A C extension can create parallel 
dummy threads, as long as those threads do not manipulate Python ohjects 
directly. Alternatively, you can run separate processes if your program’s work can 
be cleanly split. 


For More Information 

The Python threading SIG is a group working to document and improve the state of 
threading in Python; mailing list archives are available. See the Python SIG page at 

http: //WWW .python.org/sigs/. 

Stackless Python is an alternate Implementation of the Python interpreter that sup- 
ports, among other things, microthreads, or ultra-lightweight processes, which 
enables your program to handle hundreds or even thousands of threads without 
getting bogged down just switching between them. Visit www. stackl ess . org for 
more information. 


Summary 

Threads enable your programs to perform multiple tasks at once. Untamed threads 
can break one another’s data, but Python’s locking mechanisms let you direct 
threads to work together. In this chapter, you: 

Created threads with the thread and threading modules. 

Controlled data access with locks and semaphores. 

-f Built easy thread-safe code with the Queue class. 

In the next chapter, you’ll learn tools and techniques to help you debug and profile 
your Python applications. 


C H 


P T E 


Debugging, 
Profiling, and 
Optimization 

B ugs can surface in the best of code — often at the worst 
possible times. Fortunately, Python features a debugger 
to Help squash bugs. You can also use Python’s profiler to 
identify bottlenecks in your code. A few optimization tricks 
can go a long way toward speeding up a sluggish script. 


Debugging Python Code 

Adding pri nt statements is no substitute for stepping 
through code. The Python debugger, pdb, lets you set break- 
points, examine and set variables, and view source code. pdb 
is similar to the C/C++ debugger gdb (which, in turn, was 
based on xdb), so the gdb veterans in the audience will recog- 
nize most commands. Most commands can be written in a 
long way or a short way. For reference, thls chapter lists them 
in the following form: Long way (abbrevi ati on). For exam- 
ple: continue (c). A list of commands is also available within 
pdb by typing hei p (h). 

Cross- A See Appendix B for a guide to debugging under IDLE and 
Referenc^ PythonWin. Both provide excellent debuggers that are 
more powerfui than pdb. 
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Starting and stopping the debugger 

To use the debugger, import the module pdb. Then, you can 
start the debugger by calling pdb . run (statement [, gl ob- 
als[,locals]]). Here, statement is code to execute (as a 
string). You can run in a particular context by passing global 
and local namespace-dictionaries for globals and locals. The 
debugger will stop and walt for input before actually running 
the code; this is a handy time to set breakpoints: 
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>>> import pdb 

>>> pdb.run("import DebugMe") 

> C:\PYTH0N20\<string>(0)?() 

(Pdb) print "I can execute arbitrary commands at this prompt!" 

I can execute arbitrary commands at this prompt! 

(Pdb) fred=25 
(Pdb) fred 
25 

You can do whatever you want from the pdb prompt. However, it provides some 
useful special commands, described next. 

The function r u n e v a 1 is the same as run, except that r u n e v a 1 returns the value of 
the statement executed in the debugger. Thefunction runcal 1 (function[, 
arguments . . . ]) executes the function function in the debugger, passing along 
any arguments to the function. It returns the return-value of function. 

The function post_mortem(tracebacl<) enters post-mortem debugging of a partic- 
ular traceback. The function pm starts debugging the most recent traceback; it is a 
synonym for post_mortem(sys.1ast_traceback). 

The function set_trace enters the debugger immediately. It is a useful function to 
put in code that encounters (or raises) an Asserti onError. 

To get back out of the debugger, use the command qui t (q). 

Examining the state of things 

The all-important command 1 i st (1) shows the source code you are debugging. 
Use 1 i s t F i r s t L i n e L a s t L i n e to list a range of lines (by default, pdb shows up to 
five lines above and five lines below the current line). 

The command where (w) shows the current stack trace, while up (u) and down (d) 
move through the stack. (Note that running w under an IDLE shell shows about 10 
extra stack frames, because IDLE is running above your code.) 

You can display a variable’s value with print (p). 

Eor example, here is a simple debugging session. Looking at the code, plus some 
variables in context, gives me a pretty good idea what went wrong: 

>>> FancyPrimeFinder.FindPrimes(lOO) 

Traceback (innermost last): 

File "<pyshel1#19>", line 1, in ? 

FancyPrimeFinder.FindPrimes(100) 

File "C:\PYTH0N20\FancyPrimeFinder.py", line 9, in FindPrimes 
NumList=fi 1 ter(1ambda y,x=NumList[Index]: 
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IndexError: list index out of range 

>>> itnport pdb 

>>> pdb.ptn() # Post mortem! 

> C: \PYTH0N20\Fancy PrimeFinder.py(9)FindPrimes() 

-> NumList=fi 1 ter(1ambda y,x=NumList[Index]: 

(Pdb) 1 

4 NumList = range(2,EndNumber) 

5 lndex=0 

6 whi1 e (Index<len(NumList)): 

7 Index += 1 

8 -> NumList=fi 1ter(1ambda y,x=NumList[Index]: 

9 (y<=x or y%x!=0), NumList) 

10 return NumList 

[EOF] 

(Pdb) p NumList 

[2, 3, 4, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 
59, 61, 67, 71, 73, 79, 83, 89, 97] 

(Pdb) Index 
26 

(Pdb) p len(NumList) 

26 

(Pdb) p NumListCIndex] 

*** IndexError: <exceptions.IndexError instance at 0098E714> 


Setting breakpoints 

The break (b) command handles breakpoints. You set breakpoints in two ways: 
break [name:] index sets a breakpoint on line index of file name. break functioni, 
condi ti on ] sets a breakpoint on the specified function, but only when condition is 
true. Breakpoints are given sequential ID numbers, starting with 1. Running break 
(b) with no arguments prints a list of the current breakpoints: 

(Pdb) b 

(Pdb) b FancyPrimeFinder.py:9 
Breakpoint 1 at C: \PYTH0N20\Fancy PrimeFinder.py:9 
(Pdb) b 

Num Type Disp Enb Where 

1 breakpoint keep yes at C: \PYTH0N20\Fancy PrimeFinder.py:9 

Use ciear (cl) to ciear breakpoints. Pass their ID numbers, or just type cl to ciear 
them all. Similarly, use di sabl e to disable breakpoints. You can re-enable a break¬ 
point witb enabl e (but a cleared breakpoint is gone forever). 

The command ignore id [count] ignores breakpoint id up to counf times. The 
command tbreak, with the same arguments as break, sets a temporary break¬ 
point, which is automatically cleared the first time it is hit. Finally, the command 
condition id [expr] attaches the condition expr to breakpoint id\ if exprfs omit- 
ted, the breakpoint becomes unconditional. 
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Running 

The command continue (c) telis pdb to start running the program again. The com- 
mand return (r) keeps running until the current function returns. The commands 
next (n) and step (s) execute only the current statement. The difference between 
the two is that step “steps into” functions (it breaks inside the function call), and 
next “steps over” function calls (it runs the whole function call, and then breaks on 
the next line of the current source file). 

Aliases 

The command alias [name [command] ] creates an alias, name, which executes 
command. The alias can take arguments. These arguments replace %1, %Z, and so 
on, in command, while %* is replaced by all the arguments. Calling alias without 
passing a command shows the current command for name; calling alias with no 
arguments lists the current aliases. Aliases can be nested. They only apply to the 
first word typed at the pdb command line. 

For example, here is an alias to print an objecfs members, and a shortcut for print- 
ing the members of sel f; 

(Pdb) alias pi for k in % 1 . _dict_.keys(): print "%1."+k+"="+%l._dict_[k] 

(Pdb) alias ps pi self 
(Pdb) pi TettipFile 

TettipFile.BackupFileNattie=C:\DOCUME~l\ADMINI~l\LOCALS~l\Teitip\~3400-l 
TettipFi 1 e. Fi 1 e=<open file 'fred.txt', mode 'w' at 008053D8> 

TettipFi 1 e. Fi 1 eName=fred. txt 

You can put alias definitions (or any other pdb commands) into a file named .pdbrc 
in your horne directory or the current directory. pdb will execute the commands 
from .pdbrc on startup. If .pdbrc files exist in your horne directory and the current 
directory, the horne directory’s .pdbrc executes first, followed by the local file. 

Debugging tips 

Bugs in destructors can be especially hard to track down. Any exceptions thrown in 
a destructor are spewed to stderr and ignored. Therefore, destructors are a great 
place to call pdb . setjrace: 

def _del_(self): 
try: 

self.cleanup() 
except: 

# If we don't catch it, NO ONE CAN! 

pdb.set_trace() 
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If an object is stili around when the program finishes running, its destructor may 
execute a/ter the Python interpreter has freed any imported modules. Therefore, an 
innocent-looking call to os.remove may resuit in the error “‘None’ object has no 
attribute ‘remove’”. A trick that sometimes works is to prefix a module-level vari- 
able with an underscore; such items are destroyed before other members. Safest of 
all is not to do anything too elever in destructors, unless you carefully get rid of 
objects as you go. 


Working with docstrings 

Documentation helps people use each other’s Python modules. But documentation 
often becomes out-of-date, whlch is sometimes worse than no documentation at all! 
By using docstrings, you can maintain code and documentation in one place. You 
can also use the pydoc module to extract your code’s docstrings into professional- 
looking text or HTML documentation, so that people can use your modules without 
ever needing to read code. 

You can use pydoc interactively. Call pydoc . hei p( object ) to view Python docu¬ 
mentation for an object. This can be much more convenient than leaving the inter¬ 
preter to read documentation. For example: 

>>> pydoc.heip(string.strip) 

Help on function strip in module string: 

strip(s ) 

strip(s) -> string 

Return a copy of the string s with leading and trailing 
whitespace removed. 

You can also use pydoc from the command Une. To view module documentation as 
text, pass the module name as an argument, like this: 

python pydoc.py urllib 

Or, use the -w argument to write out documentation to an HTML file. For example, 
this commend writes HTML documentation of uri 1 i b to the file urllib.html: 

python pydoc.py -w urllib 

The pydoc module has one more trick up its sleeve: Run it with no command line 
arguments, and it will run as a documentation Web server. You can read documenta¬ 
tion for all the modules in your PYTHONPATH, all from the comfort of your browser! 

New The pydoc module is new in Python 2.1. (However, it runs on versions 1.5 and 

Feature \ 

Up.) 
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Automating Tests 

Testing code is not as fun as writing code. But testing is essential to avoid poor- 
quality code. Luckily, Python comes with tools to help you build automated tests. 
The uni ttest module (also known as PyUnit) is a framework for testing your code. 
The docte st module helps you keep documentation and code in synch. 

New Both doctest and uni ttest are new in Python 2.1. 

peature 

Synching docstrings with code 

The doctest module helps you defend against out-of-date documentation. To use 
doctest, copy the text of a successful Interpreter session and then paste it into 
your code’s docstrings. Later, run doctest.testtnod(tnodule) to re-run that inter¬ 
preter session, and make sure that the output is the same. 

For example, suppose I am parsing some comma-delimited files that I exported from 
Microsoft Excel. Normally, I could use stri ng . spl i t to split a line into fields. But 
Excel uses some special rules to deal with commas within data. So, 1 write a func- 
tion called Spl i tComma Fi el ds to split fields, and test it in the interpreter. It 
Works — so far, so good. To make sure my code’s documentation doesnT become 
out-of-date, I copy my interpreter session into the docstring. Listing 27-1 shows the 
resultlng file: 


Listing 27-1: CSV.py 


import doctest 

import CSV # Import ourselves! 

def SplitCommaFieldsCLine): 

SplitCommaFields breaks up a comma-delimited .csv file into 
fields: 

>>> SplitCommaFields('a,b,c') 

['a' , 'b', 'c'] 

It handles commas within fields: 

>>> SplitCommaFieldsC'Ati as shrugged,"Rand,Ayn",1957') 
['Atlas shrugged', 'Rand,Ayn', '1957'] 

Also, it handles double-quotes within fields: 

>>> SplitCommaFieldsC'"Are ""you"" happy?", "Stuff, is,fun"' ) 
['Are "you" happy?', 'Stuff,is,fun'] 

Fields=Line.split(",") 

RealFields=[] 

InsideQuotes=0 
BigField="" 
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for Field in Fields: 
if InsideQuotes: 

BigField+=","+Fiel d 
if BigField[-l]=='" ’ : 

BigField=BigField[:-1] # ki 11 trailing " 
RealFields.append(BigFi el d) 
InsideQuotes=0 

elif 1 en(Field)==0 or Field[0]! = ' "': 

Real Fiel ds.append(Field) 
else: # we saw a startquote 
if ( Fi el d[ -1 ]==""): 

RealFields.append(Field[l:-l]) 

else: 

Bi gFi eld=Field[l: ] 

InsideQuotes=l 

return tnap( 1 ambda x:x.replace( . RealFields) 

i f _natne_=="_tnai n_": 

doctest.testmod(CSV) # Test this module 


When I run CSV. py from the command line, I get no output, indicating that my func- 
tion stili runs as documented. As a sanity check, 1 can pass the - v argument to see 
doctest do its work: 

C:\source\test>python CSV.py -v 

Running CSV._doc_ 

0 of 0 examples failed in CSV._doc_ 

Running CSV.SplitCommaFields._doc_ 

Trying: SplitCommaFieldsC'Ati as shrugged,"Rand,Ayn",1957') 
Expecting: ['Atlas shrugged', 'Rand,Ayn', '1957'] 
ok 

[...deleted for brevity...] 

3 passed and 0 failed. 

Test passed. 


Unit testing 

The uni ttest module is a Python version of Kent Beck’s unit testing framework. It 
helongs to the same illustrious lineage as JUnit and CppUnit. You can use it to huild 
one or more test cases for a class or module and group test cases into test suites. 

To huild an automated test, create a subclass ofunittest.TestCase. Your class 
should override the runTest method to perform some test, using the assert to 
flag errors. For example, this class tests the Spl i tComma Fi el ds function deflned 
earlier: 
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cl ass CommaContentsTestCase(unittest.TestCase): 
def runTestCseIf): 

Line=’one,two,"three,three,thr,ee","fo,ur",five' 
assert SplitCommaFields(Line)==\ 

['one','two','three,three,thr,ee','fo,ur','fi ve'] 

You can run the test interactively by calling the run method ofaTestRunner 
object, such as the TextTestRunner: 

>>> TestRunner=unittest.TextTestRunner() 

>>> TestRunner.runiCSV.CommaContentsTestCaseC)) 


Ran 1 tests in O.OOOs 
OK 

<unittest._TextTestResult run=l errors=0 failures=0> 

You can also run tests from the command line. One method is to change your script 
to call uni ttest. tnai n (): 

i f _name_=="_tnai n_": 

uni ttest. tnai n () 

Then, calling your script from the command line will run all its test cases. 


Test suites 

The TestSuite class is a handy way to group related test cases. It provides a 
method, addTest(TestCase),for adding test cases to a list. For example, this func- 
tion returns a suite of test cases: 

def CSVSuite(): 

Suite=unittest.TestSuite() 

Suite. add(ComtnaContentsTestCasef)) 

Suite. add(QuoteCotnmentsTestCasef)) 
return Suite 

If you define a function (such as CSVSuite previously) to return aTestCaseor 
TestSui te object, you can invoke your unit test(s) from the command line like 
this: 

python unittest.py CSV.CSVSuite 


Repeated testing tasks 

The TestCase class provides setUp and tearDown methods, called before and after 
the main runTest method. These methods help you build test cases without 
repeating the same setup and cleanup steps in your test code. For example, sup- 
pose you have several tests that must create a temporary file. This base class takes 
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care of file creation and cleanup, so that your test cases can freely write to 
sel f. Fi 1 e in the runTest method: 

class FileTestCase(unittest.TestCase): 
def setUpfself): 

sel f. Fi 1 eNatTie=tetTipf i 1 e. mktetrip () 
self.Fi 1e=open(self.Fi 1eName,"w") 
def tearDown(self): 
sel f.Fi 1 e.close() 
os.remove(self.Fi 1eName) 


Finding Bottienecks 

Python is a high-level language, often used in situations where speed is not crucial. 
Programmer time is usually more expensive than processor time. However, it is 
sometimes important to optimize your Python program — to make them conserve 
time, memory, or some other resource, such as database cursors. 

Note some rules of thumb for optimization: 

1. Optimize last. Life is too short to spend time optimizing code that may be 
rewritten or scrapped. 

2. Test your optimizations by timing them on realistic runs. Optimization often 
means some sacrifice of simplicity, readability, or maintainability; it’s best to 
make sure the sacrifice is worth the gains. 

3. Comment all but the most glaringly obvious optimizations. This helps inno- 
cent bystanders understand your code, and (it is hoped) ensures that no one 
will undo the optimizations for the sake of readability. 


Profiling code 

To quickly profile a statement, import the prof i 1 e module, and then call 

profi 1 e . run (statement ). For example, the following code profiles a script that 

sorts MP3 files by artist: 


>>> profi 1 e.run("SortMP3s()") 

30289 function calls (30166 

Ordered by: Standard natne 

ncalls tottime percall cutntime 
1 0.029 0.029 9.685 

271 0.020 0.000 0.020 


primitive calls) in 10.560 CPU seconds 


percall fi 1ename:1 ineno(function) 
9.685 <string>:l(?) 

0.000 ID3Tag.py:105(_init_) 
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9 

0.000 

0.000 

0.000 

0.000 ID3Tag.py:130(theTitle) 

9 

0.000 

0.000 

0.000 

0.000 ID3Tag.py:137(theArtist) 

1 

0.292 

0.292 

0.704 

0.704 ID3Tag.py:20(?) 

1 

0.000 

0.000 

0.000 

0.000 ID3Tag.py:23(ID3Tag) 

271 

0.151 

0.001 

0.168 

0.001 ID3Tag.py:304(Read) 

45 

0.016 

0.000 

0.016 

0.000 ID3Tag.py:333(RemovePadding 


[...truncated for brevity...] 

Each line of the output corresponds to a function. The columns show the following: 

-f ncalls — How many times the function was called. If a function recurses, two 
numbers are shown: total calls, and then total primitive calls. For instance, 
the script made one call to os . path . wal k, which resulted in 123 other calls: 

124/1 0.500 0.004 8.862 8.862 ntpath.py:255(walk) 

-f tottime — Total CPU time spent in the function. 

-f percall — Average CPU time. Equal to tottime divided by ncalls. 

cumtime — Cumulative CPU time spent in the function and its subfunctions. 

-f percall— Average cumulative time. Equal to cumtime divided by ncalls. 

filename:lineno(function) — Source file name, line number, and function name. 
The first line of output corresponds to the code passed to run; its filename is 
listed as “<string>”. 

Note When profiling from a Python shell window in IDLE or PythonWin, any code that 
prints to stdout will trigger function calls within the IDE's framework. These func¬ 
tion calls will show up in the profiler's output! Running Python from the command 
line Works around this problem. 


Using Profile objects 

The Profile class provides a run (command ) method to profile the specified command. 
Normally, the command runs in the current namespace. To run the command in a 
particular namespace context, pass the global and local dictionaries (as returned by 
built-in functions globals and locals) to runctx( command, gl obal s , 1 ocal s ). To 
profile a function call, you can call runcall (functi onnamef, arguments ...]). 

After running a command, call the pri nt_stats method to print statistics, or the 
dump_stats (f i 1 ename ) to write out stats (in nonreadable format) to the specified 
file. 

A call to prof i 1 e. run (commandf, f i 1 ename] ) creates a Profile object, calls 
run (command ), and then calls either pri nt_stats or dump_stats (f i 1 ename). 

The Profile class can besubclassed. For example, the class HotProfile is asub- 
class of Prof i 1 e. It calculates less data (ignores caller-callee relationships), but 
runs faster. 
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Calibrating the profiler 

There is a small time lag between the time that an event happens and the time that 
profiler records. The call to time.clock is not free. This lag adds up over the course 
of many function calls, and can make timing Information less accurate. 

Calibrating the profiler compensates for this lag by adding a “fudge factor.” This 
makes the profiler’s statistics more accurate. To calibrate the profiler, call Its 
cal i brate method: 

>>> import profile 

>>> Prof=profi 1 e.Profi 1 e() 

>>> "%f" % Prof. cal i brate QOOOOO) 

'0.000017' 

The number returned is your fudge factor. To use it, you must edit the library code 
(in libXprofile.py). In the trace_dispatch method, replace the line 

t = t[0] + t[l] - self.t # No Calibration constant 

with this line: 

t = t[0] + t[l] - self.t - (your calibration constant) 

Note Profiling with calibration is more accurate overall. However, the profiler may occa- 

sionally report that a negative amount of time was spent in a function. This results 
from the imperfection of the fudge factor, and is not a cause for panic. 

Customizing statistics 

The module pstats provides a class. Stats, for storing and printing statistics gath- 
ered in a profiling run. 

The Stats constructor takes either a file name or a Profile object. You can either 
pass in a Prof i 1 e object (after calling run), or pass the name of a stats file created 
by the profiler. You can also pass one (or more) file names or Profile instances to 

Stats.add. 

For example, the following code runs the same command several times, and com¬ 
bines the statistics, on the assumption that behavior may vary from one run to the 
next: 

def ProfileSeveralRunsCCommand,Times): 
if (Timesfl): return 
Stats Fi 1es=[] 

for Runindex in range(Times ): 

Fi 1 eName=''stats%d''%( Run Index) 
profi1 e.run(Command, FileName) 
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StatsFiles.appencKFil eNatne) 

# Pass one filename to the constructor: 
AggregateStats=pstats.Stats(StatsFiles[0]) 

# Pass along all other filenames: 
if 1 en(StatsFi1 es )>1: 

AggregateStats.add(*(StatsFiles[l:])) 

AggregateStats.print_stats() 

It is generally a good idea to call stri p_di rs to trim the path to each function’s file 
name. 

You can change the ordering of statistics by calling the method sort_stats 
(f i e 1 d [, . . . ]) . Here, field is the name of a field to sort on. You can pass several 
field names. In this case, subsequent fields are used to sort if values of the first field 
are equal. Alphabetic fields are sorted in ascending order; numeric fields are sorted 
in descending order. (The method reverse_order reverses the ordering.) Table 
27-1 lists the available fields. 



Table 27-1 

Stats Field Names 

Name 

Meaning 

cumulative 

Cumulative time spent in a function 

calls 

Total calls to a function 

time 

Time spent in a function (not including subfunctions) 

name 

Function name 

file 

Source filename 

module 

Source filename (same meaning as file) 

line 

Source line number 

nfl 

Name/File/Line. sort_stats ( "nfl ") is the same as 

sort_stats (" name","fi 1 e", " 1 i ne") 

pcalls 

Total primitive (nonrecursive) calls to a function 

stdname 

Standard name 


The method pri nt_stats ([restrictions. . .]) prints the statistics. You can 
pass one or more arguments to filter which lines are printed. Pass an integer, n, to 
print only the first n lines. Pass a decimal between 0 and 1 to print that percentage 
of the lines. Or, pass a regular expression (as a string) to print only lines whose file 
name matches the regular expression. 
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This example runs some code and then prints statistics for the most time- 
consuming functions: 

>>> Prof=profi 1 e.Profi 1 e() 

>>> Prof.run("import FancyPrimeFinder") 

<profi 1 e.Profi 1 e instance at 00854E54> 

>>> MyStats=pstats.Stats ( Prof) 

>>> MyStats.sort_stats("time") # expensive functions first 
<pstats.Stats instance at 007E48DC> 

>>> MyStats.strip_dirs().print_Stats(10) # top 10 only 

Note that most methods of Prof i 1 e and Stats return the object itself; this makes 
it easy to chain several method calls in one line, as the last line of the preceding 
code does. 

The method pri nt_cal lers([restrictions. . .]) shows all the callers for each 
function. On the left is the called function; on the right is its caller, with the number 
of calls in parentheses. Similarly, pri nt_cal 1 ees( [restri cti ons . . . ]) shows 
each function on the left column; functions it called are on the right. 


Common Optimization Tricks 

The following sections outline some ways to speed up Python code. Use these on 
bottleneck code, after you have identified the bottlenecks using the prof i 1 e mod¬ 
ule. Keep in mind that sometimes the best way to speed up a function is simply to 
write it as an extension module in C. 

Cross- A See Chapter 29 for more information about how you can create C libraries usable 
Referenc^ from your Python programs. 


Sorti ng 

Sorting a sequence with the sort method is very fast for numbers and strings. If 
you need to perform custom sorting (e.g., a comparison of two objects), you can 
pass a comparison function to sort. You can also customize sorting for a class by 

defining a_cmp_method. However, passing a function to sort is faster than 

implicit use of the_cmp_method. Compare the following two lines: 

Poi ntLi st.sort ( Poi nt) # Uses Point._cmp_ implicitly 

Poi ntti st.sort(Point._cmp_) # Trickier, but faster! 

When sorting a list of objects, one trick is to find a “key” that you can sort on. The 
key values should be an easy-to-sort type (for example, numbers); and they should 
be mostly unique across list entries. The following function provides an example: 
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def SortByKey(List,KeyMaker): 

.Sort a list. The parameter KeyMaker is a function that 

returns a key for the list element. The keys are used 
sort the list.. 

# Repi ace each element x with (KeyMaker(x),x): 

TempList=map(1ambda x,f=KeyMaker: (f(x),x), List) 

# Tuples sorted by comparing just the first elements: 

# If the first elements match, the second elements 

# are compared; so if KeyMaker(x)==KeyMaker(y), then we 

# *will* end up comparing x and y directly. 

TempList.sort() 

# Get rid of the keys - replace (KeyMaker(x),x) with x: 

return map(1ambda(key,x):x, TempList) 

For instance, I wrote code to sort a list of points according to their distance from 
the origin. Using SortByKey (instead of passing a function to sort) made the code 
roughly three times faster. 

Looping 

Use xrange for looping across long ranges; it uses much less memory than range, 
and may save time as well. Both versions are likely to be faster than a wh i 1 e loop: 

for x in range( 10000): pass # memory hog 
for x in xrange(10000): pass # good! 

You can often eliminate a loop by calling map instead. 


1/0 

Each call to a file’s readl i ne method is quite slow. It is much faster to read the 
entire file into memory by calling readl i nes; however, this uses up a lot of RAM. 
Another approach is to read blocks of lines. Best of all — in Python 2.1 — is to use 
the xreadl i nes method of a file: 

#1 - Slow: 

while 1: 

Fi 1eLine=fi1 e.readl ine() 

if (Fi 1eLine==""): break # EOF 

DoStuff(Fi 1eLine) 

# 2 - Fast, but possibly memoryintensive: 

Fi 1eLines=fi 1 e.readl i nes () 
for Fi 1eLine in Fi 1eLines: 

DoStuff(Fi 1eLine) 

# 3 - Faster than 1, without hogging too much memory: 

# (Use this for filelike objects without an 

# xreadli nes() method) 

while 1: 

Fi 1 eLines=fi 1 e.readli nes(100) 
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if 1 en(Fi 1eLines)==0: break # EOF 
for Fi 1 eline in Fi 1 elines: 

DoStuff(Fi 1 eline) 

# 4 - Fast and simple; requires version 2.1: 

for Fi 1 eline in fi 1 e.xreadli nes(): 

DoStuff(Fi 1eline) 


Strings 

Building up strings with the concatenation operator + can be slow, because it often 
involves copying strings several times. Formatting using the % operator is generally 
faster, and uses less memory. For example: 

HTMlString=HTMlHeader+HTMlBody+HTMlFooter # slow! 

HTMlString="%s%s%s"%(HTMlHeader,HTMlBody,HTMlFooter) # fast! 

If you are building up a string with an unknown number of components, consider 
using stri ng. joi n to combine them all, instead of concatenating them as you go: 

# Slow way: 

Str="" 

for X in range(10000): 

Str+='X' 

# Fast way (10 times as fast on my machine): 

list=[] 

for X in range( 10000): 

list.append('X' ) 

Str=str i n g. j 0 i n (li st,"") 

When using regular expressions, create a reusable regular expression object using 
re. compi 1 e instead of re.search and re .match directly. This saves time, because 
the regular expression doesn’t have to be repeatedly parsed. Following is a con- 
trived example: 

PATTERN=r"^[0-9]+(\.[0-9]+)*$" # Match valid version numbers 
Val idDottedDecimal=re.compi1 e(PATTERN ) 
for Index in range(lOO): 

re.search(PATTERN,"2.4.5.3") # slow way! 
for Index in range(lOO): 

ValidDottedDecimal.search("2.4.5.3") # fast way! 


Threads 

If your script uses only one thread, you can save time by forcing the interpreter 
to check for other threads less often. The method sys . setchecki nterval (codes ) 
telis Python to consider switching threads after codes bytecodes. The default 
check-interval is 10; setting it to something large (like 1,000) may improve your 
performance. On my Windows machine, the gain is negligible. 
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Taking out the Trash—the Garbage Collector 

Python doesn’t require you to explicitly allocate and free memory. When you need 
more memory to hold some data, the Python interpreter allocates it for you. When 
you are finished with data, the interpreter usually gets rid of it. 

Python cleans up memory by using reference counting: For each chunk of memory 
you use, Python keeps track of how many references to the object exist. When you 
assign a reference to an object, its reference count increases; when you get rid of a 
reference, the objecfs reference count decreases. When there are no more refer¬ 
ences to an object, Python frees the objecfs memory: 

>>> class Thingy: 

def _init_(sel f, Natne): 

sel f. Narrie=NatTie 
def _del_(self): 

print "Del eti ngsel f. Natne 

>>> A=Thingy ( "X" ) # The variable A holds only reference 
>>> A="Crunchy frog" # Refcount goes to 0 -> object is freed! 

Deleting: X 
>>> A=Thingy("X") 

>>> B=Thingy( "Y") 

>>> A.ref=B # Y's Refcount goes from 1 to 2 
>>> B=None # Y's Refcount goes from 2 to 1 

>>> # This takes X's refcount to 0, so X is deleted. Deleting 
>>> # X takes Y's refcount to 0, so Y is deleted too: 

>>> A=None 
Deleting: X 
Deleting: Y 

Note that the built-in function dei does not (necessarily) delete an object; it deletes 
a variable (and thus decrements the objecfs reference count): 

>>> A=Thingy("X") 

>>> B=Thingy("Y") 

>>> A.ref=B 

>>> dei B # Variable B is gone, but object Y stili exists 
>>> A. ref.Name # See! Object Y is stili there! 

'Y' 

Reference counts and Python code 

Reference counting is different from automatic garbage collection (as seen in Java). 
For example, as long as two objects hold references to each other, Python won’t 
free them. If an object is no longer usable by a program, but its memory is not 
freed, the object is leaked. Leaked memory normally gets cleaned up when the pro¬ 
gram termlnates. However, a program that runs for a long time can leak many 
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megabytes of memory, a few bytes at a time. For example, after you run the follow- 
ing statements, the two objects each have a reference count of 1, and so will stick 
around until you exit the interpreter: 


>>> 

A=Thingy 


X") 




>>> 

B=Thingy 


Y") 




>>> 

A.ref=B 

# 

Y's 

refcount 

is now 

2 

>>> 

B.ref=A 

# 

X's 

refcount 

is now 

2 

>>> 

dei A 






>>> 

dei B 






>>> 

# Congra 

tuiati 

ions! You 

just leaked memory! 


Normally, these memory leaks are not big or common enough to worry about. If you 
find yourself running low on memory, however, you may need to start worrying. In 
order to rid yourself of an object, you must get rid of all references to it — and to do 
that, you must keep track of all the references. 

Reference counts and C/C++ code 

Shooting yourself in the foot is downright difficult in Python, but very easy in C. 
When writing C extensions, you must keep track of the reference counts of each 
Python object. Losing track of reference counts can lead to memory hemorrhaging 
(as opposed to mere memory leaks), and even core dumps. 

The macros Py_INCREF(x) and Py_D E C R E F ( x ) increment and decrement the refer¬ 
ence counts of a Python object x. At any given time, each reference is considered to 
be owned by some function. When that function exits, it must transfer ownership of 
the reference, or else get rid of the reference with a call to Py_DECREF. A function 
can also borrow a reference — the borrower uses the reference, never touches the 
reference count, and lets go of the reference before the owner does. Owning and 
borrowing are not explicit in the code, but the comments generally Indicate to 
whom a reference belongs. 

When writing C extensions, it is important to track reference counts properly. 
Linking Python with Py_REF_DEBUG and Py_TRACE_REFS turned on provides extra 
information for debugging reference counts. In addition, you can call 
Py Pri ntReferences to print out all the objects and their refcounts. 


Summary 

Debugging is never a painless process, but pdb helps make it as easy as possible. In 
addition, IDEs like PythonWin provide debugging with a snappier interface. The 
Python profiler helps you find bottlenecks in your code. In addition, a review of 
Python’s garbage collector can save a lot of memory. In this chapter, you: 
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-f Debugged Python programs using pdb. 

Profiled code to find tbe most expensive functions. 

-f Optimized various types of code. 

Learned how to leak memory (and bow not to leak memory). 

In tbe next chapter, you learn how to combine the speed of C with the power of 
Python by writing C extensions. 

■f -f 


Security and 
Encryption 


C H 



W ith the explosive growth of the Internet and with 
countries shifting to more global economies, the 
issue of security is increasingly important. Banks, businesses, 
governments, and consumers routinely transfer sensitive 
information; and computers attached to the Internet are 
potentially accessible by anyone. This chapter describes 
Python’s modules for creating digital fingerprints of messages, 
running Python code in a safe sandbox, and using basic 
encryption and decryption. Online, you can also find strong 
encryption extension modules for triple DES, Blowfish, and 
the like. 


> ♦ ♦ ♦ 

In This Chapter 

Checking passwords 

Running in a 
restricted environment 

Creating message 
fingerprints 

Using 1940s-era 
encryption 

> ♦ ♦ ♦ 


Checking Passwords 

The most basic and common form of security is restricting 
access untii the user enters a valid username and password. 
When you prompt a user for his or her password, however, 
you don’t want the password to be displayed on the screen, 
lest a “neighbor” with wandering eyes sees the password too 
For these situations, use the getpass module. 


There really isn’t a safe and platform-independent way to have 
the user enter a password, so getpass has a different imple- 
mentation for UNIX, Windows, and Macintosh Systems. If for 
some reason it can’t use any of the speciai implementations, it 
will at least warn the user that the password might acciden- 
tally be displayed on the screen. 



The Windows version uses getch () in the tnsvcrt mod¬ 
ule, which doesn't behave quite how you'd expect in some 
IDEs, such as IDLE or PythonWin, so if you want to try 
getpass out, run it in an interpreter started on the com- 
mand line. 
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Use the getpass ([prompt]) function to request the user’s password. The follow- 
ing function queries the user for a name and password and then returns them as a 
tuple: 

>>> import getpass 
>>> def getLogin(): 

name = raw_input('Name:') 

passwd = getpass.getpass('Password:’) 

return name,passwd 

>>> getLogin() 

Name:robinhood 

Password: # Characters typed are not echoed 

('robinhood', ’ 1 ittleJohn') 

getpass also has the getuser () function, which returns the login name of the 
user: 

>>> getpass.getuser() 

’dave' 


getuser checks the values of the LOGNAME, USER, LNAME, and USERNAME environ- 
ment variables, returning the value of the first one that is present and not empty. ff 
all fail and the system has the pwd module (UNIX), then that is used; otherwise, an 
exception is raised. 

Cross- ^ Chapter 38 provides the pwd module, which you can use for accessing the UNIX 
Refere nce ,^ user account database. You can also use the crypt module to check whether the 
password a user entered is correct (i.e., matches their login password). 


Most GUI toolkits have their own method for prompting a user for a password. For 
example, in wxPython, you can set a flag in the text entry field that holds a pass¬ 
word, so that anything typed is displayed with asterisks. 

Note UNIX users that have the readl i ne module activated need not worry that after 

entering their password it will show up in the command history. getpass uses its 
own implementation of raw_i nput in order to avoid that security hole. 


Running in a Restricted Environment 

Imagine that you decided to create an online game in which players from all over 
the World would upload Virtual robots to traverse a maze and destroy one another. 
Not only did you decide to implement most of the game in Python, you chose to let 
the players program their robots using Python too. One problem, though, is that a 
malicious entrant could include code to erase files on your computer, install a 
Trojan horse, or cause damage in any number of other ways. How could you deal 
with that danger? 
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The rexec sandbox 

The rexec module helps prevent such a scenario. It enables you to run Python 
code in a sandbox, a restricted environment in which you control what resources it 
can access — just as a child in a sandbox imagines it to be the entire world when in 
reality it’s quite isolated and small. You can, for example, enable Python programs 
to do whatever they want as long as they don’t try to create any socket connec- 
tions, or enable them to create files only in a particular directory. With rexec, you 
can more safely run Python programs that didn’t necessarily come from a trusted 
source. 


To create an execution sandbox, you call RExec( [hooks] [verbose] ) to create an 
RExec object (or call the constructor of a subclass youVe created in order to over- 
ride or add to its access policies). hooks is an instance of the RHooks class (or a 
subclass), which is itself a subclass of the Hooks class in the i hooks module; and is 
what is called when it’s time to import a module. By providlng your own import 
hooks, you can monitor or log what modules are loaded, or even load them from a 
different source. The verbose argument is passed to the Hooks constructor and, 
if 1, prints extra debugging Information. 

Cross- A Refer to Chapter 34 for Information about the i hooks module and implementing 

Refere nce ^ yoyf gwn module import behavior. 

Before creating your RExec instance object, you can change some of its class vari- 
ables to tailor what modules and functions will be available to the executing code. 
(Changing these class variables does not affect instances already created — only 
those subsequent to your changes will see the effects.) For security reasons, you 
should be careful about what values you change. If you want to change the list of 
prohibited bullt-in functions, for example, consider adding to the list Instead of 
completely replacing it, so that you don’t inadvertently create a security hole. 

ok_path is a tuple containing the paths to search when importing a module. By 
default, it matches the value of sys . path. 


ok_bui 1 ti n_tTiodul es is a tuple of built-in (not implemented in Python) modules 
that are safe to import. The default value contains the names of the following 
modules: 


audioop imageop parser 

array marshal regex 

binasci i math pcre 

cmath mdB rotor 

errno operator select 


strop 
struet 
ti me 


ok_posi x_names is a tuple of allowed functions from the os module (if present in 
the current platform’s Implementation of os). The default value contains the names 
of the following modules: 
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error 
fstat 
1 istdir 
1 stat 


readlink 
stat 
ti tnes 
uname 


getpid 
getppid 
getcwd 
getuid 


getgid 
geteuid 
getegid 


ok_sy s_narries is a tuple of variables and functions from the sy s module that 
restricted access programs can use. The default value contains the following: 

psl Copyright platform maxint 

ps2 versi on exit 

nok_bui 1 ti n_natnes is a tuple of built-in function names that programs are not 

allowed to use. By default, the list contains 'open', 'reload', and '_ i mport_', 

so functions such as map are stili allowed (most built-in functions are relatively 
safe). 

RExec intercepts calls from the restricted program to i mport, reload, and uni oad 
and routes the calls through the Internal module loader and importer (which makes 
use of the custom import hooks). You can override RExec’s r_rel oadfmodul e), 
r_unl oadCmodul e ), and r_import(modulename[, globalsf, locals], from- 
list]]]) methods to provide custom behavior. If a module isn’t safe to be loaded, 
r_i mp 0 rt should raise in ImportError exception. 

Calls to open are sent to RExec’s r_open (filenamef, mode[, bufsize]]).By 
default, files can be opened for readlng only, but you can override this wlth differ¬ 
ent behavior if needed. 


Once you (finally!) have your RExec object, you can actually execute Python code 
in a restricted object by calling its r_eval ( code), r_exec( code ), or r_exec- 

f i 1 e( f i 1 ename ) methods, all of which run the code in the_ mai n _module of 

your new sandbox. r_eval takes as an argument either a Python expression as a 
string or a compiled code object, and returns the value of the expression: 

>>> import rexec 
>>> r = rexec.RExec() 

>>> r . r_eval ( ' 2 -i- 2 ' ) 

4 


r_exec can take a string containing one or more lines of Python code or a compiled 
code object: 

>>> s = . 

... print 'My name is George' 

... q = range(10) 

... print q 

>>> r.r_exec(s) 

My name is George 

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 
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Cross- ^ Chapter 33 has information about code objects and Python introspection 
Referej^ capabilities. 

r_execf i 1 e executes the contents of a Python file in the restricted environment. 
For example, first save the code in Listing 28-1 to a file called bad.py. 


Listing 28-1 : bad.py — "Untrusted" code to test in the rexec 
sandbox 


SECRET_VIRUS_CODES = ’...<bad stuff here>... 
f = open ( ' ctnd . exe ' , ' w+b ' ) # This will fail 
f.write(SECRET_VIRUS_CODES) 
f.close() 


RExec halts as soon as the program tries to do something illegal (in this case, open 
a file for writing). 

The RExec’s add_tnodul e (tnodul enatne) method returns a module object existing in 

the restricted environment (loading it first if necessary). Because_tnai n_is also 

a module, you can use this as a gateway between the normal and restricted environ- 
ments. For example, you can have some variables already present when the 
restricted code runs: 

>>> r = rexec.RExec() 

>>> rtnain = r . add_tnodul e( '_tnain_') 

>>> rtnai n . happy Factor = 10 
>>> r.r_eval('happyFactor * 2') 

20 

You can also use it to retrieve values after the code has flnlshed. Continuing the 
prevlous example: 

>>> r.r_exec('sadFactor = happyFactor / 2') 

>>> rmain.sadFactor 
5 

For each r_<f unc> method (such as r_eval and r_exec), RExec also has a corre- 
sponding s_<f unc> method that behaves similarly except that the s_<func> ver- 
sion will have access to restricted versions ofstdin, stdout, and s t d e r r. The 
restricted versions have the following methods: 

fi leno read seek writeli nes 

flush readline teli 

isatty readlines write 
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RExec handies some security problems for you, but there are other things to con- 
sider too. For example, nothing in RExec protects against code with an infinite loop, 
or even one that rapidly creates objects untii it consumes all available memory. 


Using a class fortress 

Most classes were not designed with restricted execution in mind. By using the 
Basti on module, you can create wrappers for objects that are suitable for use with 
rexec. The wrapped version of the object has the same attributes as the original, 
but code in a restricted environment can access the attributes only if the wrapper 
allows it. 

Call Bastion(object[, filter[, natne[, bastionclass]]])to create a wrap¬ 
per, where obj ect is the object you wish to wrap. fi 1 ter is a function that accepts 
an argument name and returns true if that attribute can be used (the default filter 
grants access to all attributes that do not start with an underscore). If the function 
returns 0, the wrapper will raise an Attri buteError. The name argument is the 
name to use when printing the object; bastionclass is an alternate wrapper class 
to use, although you would rarely need to supply your own. 

As an example, suppose your robot game provides each robot with a reference to 
an Environment object through which the robot can query information about the 
“world” in which it is running (for example, number of robots stili alive, amount of 
time left in the current round, and so on). The robots call different get methods, 
but outside the restricted environment, the rest of your program can set various 
world attributes via some set methods: 

class Environment: 

def _i ni t_(sel f): 

self._robots = 0 
self._timeLeft = 0 

def SetNumRobots(self, num): 
self._robots = num 

def GetNumRobots(self): 
return self._robots 

def SetTimeLeft(self, left): 
self._timeLeft = left 

def GetTimeLeftCself): 
return self._timeLeft 

In order to make sure a player doesn’t fiddle with the time left in the game, for 
example, you can give the robots a “bastionified” version of the environment, one 
that doesn’t grant access to the ‘set’ methods: 
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def noPri vateOrSeK name): 
i f natne[: 1] == : 

return 0 

i f narrieC: 3] == ' set' : 

return 0 
return 1 

import Bastion, rexec 
e = Envi rontnent () 

be = Bastion.Bastion(e,noPrivateOrSet) 

Now your main code could make calls like the following: 

e.SetNumRobots(5) 


Code running in the restricted environment, however, could not. This next call 
would raise an AttributeError exception: 


r.r_exec('environment.SetTimeLeft(100)' ) 



As with access policies in rexec, the planning and consideration you use when 
designing a Bastion filter shouid be proportional to the damage that could occur 
if you leave a security hole open. It's best to err on the side of being overly restric¬ 
tive so that later you're not sorry. 


Creating Message Fingerprints 

A message digest is like a digital fingerprint or hash that can be used in security and 
data integrity checks. For any string of bytes, the corresponding fingerprint will 
change if the original string of bytes changes. 

One common use for these types of digests is to verify that a file transferred cor- 
rectly across an unreliable network connection. For example, a Web site with down- 
loadable ZIP files might list the digital fingerprints next to each file. After 
downloading a file, you compute the fingerprint of what you downloaded, and if the 
two match, you know the file transferred without errors. 

It’s mathematically infeasible to create a file whose fingerprint has a chosen value. 
That is to say, if someone knows the fingerprint of a file, for example, and wants to 
create another file with the same fingerprint, they aren’t going to succeed. In the 
example just described, this property of message digests verifies that what you 
download truly matches what is on that remote Web server (and that someone 
along the network route didn’t slip you a different version of the file that contains a 
virus or something). 
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IVID5 

The md5 module implements the MD5 message digest algorithm (developed by MIT 
and RSA Data Security, Inc.) to generate 128-bit message digests for arbitrarily long 
strings of bytes. Create a new rrid5 object by calling the new( [msg]) function. You 
then repeatedly call the objecfs update(msg) method until you have passed it all 
your data. At any time, you can call the d i g e s t () method to get the md5 checksum 
at that point in time: 

>>> import tTid5 
>>> m = tTid5 . new() 

>>> data = open (' c: WtempWskel eton . exe ' , ' rb' ). read () 

>>> m.update(data) 

>>> m.digest() 

’\252\221\205\274\015\317\032\304\207\266\312~$\032\204 ' 

Using the optional argument to the new function is the same as calling new without 
any arguments and then passing msg to update: 

>>> m = md5.new('The quick brown fox') 

>>> m.digest() 

'\242\00007s\013\224Eg\012s\217\240\374\236\345' 

>>> m = md5.new() 

>>> m.updateC'The quick brown fox') 

>>> m.digeste) 

'\242\00007s\013\224Eg\012s\217\240\374\236\345' 

The digest is in binary form, so md5’s hexdi gest () method returns a printable hex- 
adecimal version of the current digest (this is the text you’d display next to the file 
on the Web site, for example): 

>>> m.hexdigest() 

'a2004f37730b9445670a738fa0fc9ee5' 

If two strings share a common initial substring, you can process the common por- 
tion of the two strings first and then create a copy of the object using its copy () 
method, after which you’d use the two copies to continue computing the digest. 


SHA 

The sha module implements the National Institute of Standards and Technology’s 
Secure Hash Algorithm. It is slightly slower than MD5, but the digest it produces is 
larger (160 bits). Therefore, it is more secure against brute force-style attacks. 

You use the sha module just as you do the md5 module: 

>>> import sha 
>>> s = sha . new() 

>>> s . updateC'Python') 
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>>> s . hexdigest() 

’6e3604888c4b4ec08e2837913d012fe2834ffa83' 

Like tnd5, a sha object has update(msg),digest(),hexdigest(), and copy () 

methods. 

Other uses 

One nice property of message fingerprints is that the slightest change in the mes- 
sage results in a very different fingerprint: 

>>> sha.new('Star wars').hexdigest() 
'7dede4f3d3fa32215aad874a34225a9al59addfe' 

>>> sha.new('Star wart').hexdigest() 
'4d87932ef50601c54a4e83182a92063302ccfe31' 

In the preceding example, out of the entire string only one hyte changed; its value 
was incremented hy 1. Despite the tiny change, the digest is completely different. 
This makes it nearly impossible to hide or mistakenly overlook even small changes 
to a message or file. 

Message digests can also be useful for performing rapid comparisons of large 
objects. If you have a list or tree of large images, for example, you could compute 
the checksum of each image as it is added to the list. When it comes time to add a 
new image, you compute its checksum value and then rapidly compare it against 
the other checksums to make sure it is not already in the list (comparing a 128-bit 
MD5 digest is a lot cheaper than comparing two images). 


Using 1940s-Era Encryption 

The rotor module implements a basic encryption scheme using groups of permu- 
tations or rotors that map one character to another. Each character of a message is 
encrypted using the rotor, and the initial rotor positions are the “key” that can be 
used to decrypt the message. 

The most famous use of rotor-based encryption was by the German military during 
World War II. They used Enigma (a typewriter-like machine originally built for use 
by businesses) to transmit orders and allied ship coordinates without having to 
worry about the messages being understood by others. Fortunately for the allied 
troops, a few Enigma machines were captured, and a team of British geniuses 
cracked the codes. (You can see an entertaining but historically inaccurate version 
of this story in the movie U-571.} 

To create a new rotor object, call newrotor (key [, numrotors] ). Like the message 
you intend to encrypt, key can contain binary and not just printable text charac- 
ters. numrotors defaults to 6; using more rotors is more secure, but more costly to 
encrypt and decrypt: 
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>>> import rotor 

>>> r = rotor.newrotor('Wi11 I ever tire of spam?’, 10) 

>>> tnsg = r . encrypt (' Move into positi on at mi dni ght' ) 

>>> msg 

'5\232\001A\267\312\375d\340I\375\201\315)\324\214\311. . . 

>>> r.decrypt(msg) 

'Move into position at midnight' 

Obviously, both the sender and tbe receiver need the key. One way to handle this is 
to have a predefined set of keys such that each party knows when to use whicb one 
(for example, on Tuesdays use key #12). Anotber way is to transfer tbe key to your 
partner witbout letting others realize that it’s a key (work it into a casual conversa- 
tion about the surprise of the Spanish Inquisition, for example). 

Calis toencrypt(msg) and decrypt(msg) reset the rotors to thelr Inltial state and 
encrypt or decrypt the message. If the message is in more than one part, however, 
you can subsequently call encryptmore (msg) and decryptmore (msg) instead; 
these methods do the same thing without first resetting the rotors: 

>>> msgl = r.encrypt('The ATM PIN is 1234') 

>>> msg2 = r.encryptmore('My lucky number is 15') 

>>> r.decrypt(msgl) 

'The ATM PIN is 1234' 

>>> r.decryptmore(msg2) 

'My lucky number is 15' 

You may think that using such old encryption technology is a waste because it is 
relatively “easy” to crack (althougb stili relatively difficult for most people). 
Consider the security differences between a wooden fenee and a ten-foot tali elec- 
tric fenee covered in razor wire. Although both can be circumvented by a deter- 
mined enougb intruder, one is deflnltely stronger than the other. Likewise, a 
completely foolproof encryption scheme does not exist, and probably never will. 
Even the most basic encryption scheme will ward off 99 percent of potential intrud- 
ers simply because it’s not wortb the effort to crack, especially if you don’t adver- 
tise the type of encryption you use. Depending on your situation, something as 
simple as rotor may be suitable (kind of like a chain-link fenee with a “Beware of 
Dog” sign). 

Many modern encryption sebemes use public-key encryption, in wbich eaeb party 
has a public and private key. Everyone bas access to public keys; if someone wants 
to send you a message, tbey encrypt tbeir message using your public key. The keys 
are generated in such a way, however, that only the person with the matehing pri¬ 
vate key can decrypt the message. 
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Summary 

You can protect sensitive information and ensure message integrity using some of 
the modules covered in this chapter. For example, in this chapter, you: 

-f Used getpass to safely request the user to enter a password. 

-f Executed Python code in an environment with restricted access to different 
resources. 

Calculated unique digital “fingerprints” for checking message integrity. 

-f Encrypted and decrypted messages using the rotor module. 

In the next chapter, you learn how to add new functionality by writing your own 
extension modules. 



Writing 

Extension 

Modules 


W hile Python excels as a stand-alone language, it also 

shines as a glue language, a language that combines or 
ties together “chunks” of functlonality from other languages or 
third-party libraries. After reading this chapter you’ll be able to 
extend Python by writing your own C modules, and you’ll be 
able to embed a Python Interpreter in a C program. 


This chapter is closely tied to the next chapter; together, the 
two chapters cover most of what you need to know to use the 
Python/C API. 


Extending and Embedding 
OverView 

A Python extension is an external module written in C that 
behaves as if it were just another Python module. In fact, to 
most Python programs, an extension module is indistinguish- 
able from a “normal” module written in Python. 

Python can interface with both C and C++ programs, but 
for conciseness, they are lumped together here and 
referred to as C programs. 



Why would you want to write an extension module? The most 
common reason is to make available to Python programs a 
third-party library written in some other language. It’s these 
wrapper modules that enable Python programs to use 
OpenGL, GUI toolkits such as wxWindows and Qt, and com- 
pression libraries such as zlib. Why create something from 
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scratch if you can write a quick extension module around an existing C library? 
Along the same lines, through an extension module, your Python programs can 
access platform-specific functionality such as the Wln32 APls on Windows, or low- 
level resources such as network sockets. 

Another henefit of extension modules is that they run at the speed of compiled 
code, rather than the slower speed of interpreted Python code. Python is often fast 
enough “as is” even though it is an interpreted language, hut if you do have special 
performance requirements, you can move CPU-intensive operations into an exten¬ 
sion module. The approach 1 take is to first huild my entire application in Python, 
profile it, and then move performance hottlenecks into C as needed. This lets me 
use Python as a rapid prototyping language in which 1 can stili make changes 
cheaply (when compared to C) without having to rewrite the entire program if a few 
parts end up heing too slow. 

Proprietary information or algorithms locked away in an extension module are 
more difficult to reverse engineer; and extension modules can significantly extend 
the Python language itself hy introducing new huilt-in data types. 

The opposite of writing an extension module is emhedding the Python interpreter 
in a C program. This is useful if you have a lot of functionality that is just plain eas- 
ier to do in Python (what isn’t?), or when you have an existing application to which 
you want to add Python power. 

Emhedded Python is great as an internal control language, or even as a sort of 
macro language to customize the hehavior of your application. 

Because this chapter deais with combining C and Python, you do need a working 
C compiler; and actually knowing how to program in C wouIdnT hurt. If you don't 
have a commerciai compiler, compilers such as gcc are available free on all major 
platforms, including Windows. If you have Microsoft Visual Studio, use that, 
because Python comes with all the workspace and project files you need. 

It is also a good idea to download and build Python from the source code. This 
ensures that your setup is correct and also makes it possible for you to debug your 
modules during development. 


Writing a Simple Extension Module 

The best way to understand extension modules is to look at a simple one. Listing 
29-1 is a C program that creates a Python module called simple, which contains the 
add and count functions. The Python documentation and the next section describe 
compiling and linking extension modules, so for now, just examine the source code. 


^Note 
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Listing 29-1 : simple.c —A basic Python extension module 


#include "Python.h" 

// Add two arbitrary objects 

static PyObject *sitnple_add(PyObject *pSelf, PyObject *pArgs) 
I 

PyObject *pX, *pY; 

if (!PyArg_ParseTuple(pArgs,"00", &pX, &pY)) 
return NULL; 

return PyNuniber_Add (pX, pY ); 

1 


// A doc string 

static char count_doc[] = "Returns the number of arguments 
passed in"; 

static PyObject *sitnple_count(PyObject *pSelf, PyObject *pArgs) 
I 

long count = PyTuple_Size(pArgs); 
return PyInt_FromLong(count); 


// Map of function names to functions 

static PyMethodDef simple_methods[] = 

I 

r'add", sitnple_add, METH_VARARGS, NULL), 

{"count", sitnple_count, METH_VARARGS, count_docj, 
INULL, NULLj // End of functions 

); 


// For C++, initsimple should be declared 'extern "C" 

DL_EXP0RT(void) i nitsimple() 

I 

Py_Ini tModul e (" si tnpl e", sitnpl e_tnethods); 

1 


The following example uses the preceding module after compiling and linking: 

>>> itnport simple 

>>> simple.add( 5, 2 .5) # Add two numbers 
7.5 

>>> simple.add(['a','b’],['c',’d',’e']) # Add other types 
['a', 'b', 'c', 'd', 'e'] 

>>> simpl e.count._doc_ 

'Returns the number of arguments passed in' 

>>> simpl e.count('hei 1 0 ','there' , 5, 6) # Count args 
4 
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Familiarize yourself with the pattern of the example C file because most extensiori 
modules follow this basic form. After including the appropriate headers, it creates 
two functions and a doc string. 

Notice that each function is declared stati c, which means that they are not visible 
outside this file. The functions are made visible to Python because each is listed in 
the sitnpl e_rriethods table. When the module is imported, its i ni tsi tnpl e function 
is called, which in turn calls Py_Ini tModul e to inform Python of a new module 
called “simple” whose function pointers are in si tnpl e_tnethods. The file name of 
the module should match its name used in the code, so the compiled form of this 
module would probably be in s i tn p 1 e . d 11 or s i tn p 1 e . s o , depending on the plat- 
form. If Python can’t find an i ni t<natne> function (where <natne> is the name of the 
module), it will be unable to import the module. 

Each module function takes two arguments and returns one, all of which are 
PyObject pointers. The sel f argument is NULL unless the function is actually a 
method for a Python class youVe implemented in C; and args is a Python tuple 
containing the arguments passed in. 

The sitnpl e_add function calls PyArg_ParseTupl e (a function discussed in detail 
in “Converting Python Data to C” in this chapter) to break args into the objects in 
the tuple; it takes a format string and pointers to receive the object references. In 
this case, the format string “00” is saying that the function is expecting any two 
Python objects. si tnpl e_add takes the two objects and returns a new object by call- 
ing PyNutnber_Add. As shown in the example usage of this module, the object can 
be numbers, strings, and so on. PyNutnber_Add is part of the Python/C APPs high- 
level abstract object layer. 

j- Cross- A Chapter 30 covers both the abstract and concrete object layers that enable you to 
1 Work with either general or very specific types of Python objects. 

The si tnpl e_count function has its own doc string in courtt_doc, and it just 
returns the number of arguments contained in args. Keep in mind that in Python, 
even plain old numbers are actually objects. Therefore, before returning, the func¬ 
tion has to convert from the C long variable called count to an actual Python 
object. 

Tip The source code for Python includes the files that create the Standard built-in 

modules. These modules are a great source of examples of using the Python/C 
API because these modules are guaranteed to work; and more than likely, you're 
familiar with what they do. 

You can create a module doc string by calling Py_I ni tModul e3 instead of 
Py_InitModul e: 

static char sitnple_doc [] = 

"This is an example of a C extension module.\n\ 

Programming in C is great because\n\ 
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it reminds tne of how much fun Python is\n\ 
by comparison."; 


DL_EXP0RT(void) initsimple() 


Py_InitModule3("simple", simple_methods, simple_doc); 


Tip 


The i n i ts i mpl e () function must be visible outside the library; on Windows, that 
means using_decl spec(dllexport) ora .DEF file to export a function from a DLL. 


Building and Linking 


Before proceeding, you should download the Python source dlstribution and build 
a debug version of at least the main executable and the runtime library. The source 
comes with excellent build instructions (including an example project that you can 
use as a template), so this section provides only a brief overview and a few tips. 

You can now find debug Windows builds on the Python Web site 

(WWW. python . org). 



With an extension module, you have two options: It can be statically linked into 
Python, or your module can be dynamically loaded at run time when the user 
imports it. The latter option is easier if you want to distribute your module to other 
people, and it gives Python a smaller initial memory footprint. 

To statically link your module into Python for a UNIX build, you add it to the list of 
modules to build in the Modules/Setup file; for Windows, add it to the PCXconfig.c 
file; and then rebuild Python. 

For dynamic linking, building the module is straightforward: 

1. Create a project (Makefile, IDE project file, and so on) that builds a shared 
object. This varies by platform, but for gcc you can use the link option 
-shared; for Windows, you create a DLL project. 

2. Add to the include search path the directory containing Python.h. 

3. Add to the link search path the directory containing the Python library (for 
example, pythonxx_d.lib or libpythonxx_d.a) and include the library in your 
list of files to link. 

4. Compile and link. 



If you're using Visual Studio, under the C/CC++ tab in the Project Settings for your 
module, be sure to choose the Code Generation category; and then choose Debug 
Multithreaded under Use Run-time Library. 


Tip 
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The name of your module should match the name used in the source code. 

When you create a debug build of Python, files have a _d appended to the end (for 
example, the executable is named python_d); and when loading extension mod¬ 
ules, Python looks for the same suffix, so debug versions of your module should 
have that suffix as well. For example, if your module is named junk, normally your 
extension would be bullt as junk. so, and the debug version should be named 
j unk_d.so. 

Tip You can also name the debug and release versions of your module as 

A <tTiodul e>_d. pyd and <tTiodul e> . pyd, and Python will load them correctiy. The 

- ^ . pyd extension is preferable to the system default (usually . so or . dl 1 ) because 

your module may be a wrapper for an existing library of the same name (for exam¬ 
ple, there aiready exists an opengl.dll file, so it's less confusing if the Python wrap¬ 
per module for it is named opengl.pyd). 

The Python maintainers have done a lot of work to ensure that building extension 
modules and Python itself go as smoothly as possible. If your extension refuses to 
build, don’t get discouraged: it’s probably something minor. Try starting with one of 
the example modules and adding to it. 

Tip If you install a compiler just so you can build Python extension modules, it'll save 

^ you a lot of frustration if you take your time and make sure everything is set up 

properly before attempting to build your module. First, build a stand-alone C pro- 
gram (such as Helio World or something equally simple). Next, build Python from 
the sources. If these two steps are successfui, then proceed to build your exten¬ 
sion module. 


Converting Python Data to C 

When a C extension function is called, it needs to unpack the arguments before it 
can operate on them. 

Unpacking normal arguments 

In the example extension module earlier in this chapter, the si mpl e_add function 
called the PyArg_ParseTupl e function to unpack the Python function arguments. 
The format string telis the type of objects your function expects; the different types 
are listed in Table 29-1, along with the type of C pointers to use. A pointer variable 
follows the format string, to hold the address of each object after it is unpacked. 


Chapter29 > Writing Extension Modules 533 


Table 29-1 

PyArg ParseTuple Object Types 

Format 

Python Object 

C Variable Type(s) 

1 

Integer 

i nt' 

b 

Integer 

char 

h 

Integer 

short 

1 

Integer 

1 ong 

f 

floating-point 

f 1 oat 

d 

floating-point 

double 

D 

Complex 

Py_complex 

c 

1 character string 

char 

s 

string 

char * 

S# 

string or buffer 

char *, i nt (stores length)^ 

z 

string or None 

char * 

z# 

string, buffer, or None 

char *, i nt 

es 

string, Unicode, or buffer 

const char *encoding, char 
**buffer 

es# 

string, Unicode, or buffer 

const char *encoding, char 
**buffer, int 

S 

String 

PyStringObject 

0 

any object 

PyObject 

0! 

any object 

typeobject, PyObject 

0& 

any object 

convert_func, anytype 

t# 

read-only char buffer 

char *, i nt 

W 

read-write char buffer 

char * 

w# 

read-write char buffer 

char *, i n t 

u 

Unicode object 

Py_UNIC0DE 

U# 

Unicode object 

Py_UNIC0DE 

u 

Unicode string 

PyUnicodeObject 


1 For types that take a Python integer, you can pass in a Python long integer, but no range checking or 
conversion is performed, to ensure that the long value fits in an integer. 

2 Consuit the Python Online documentation for more information about buffer objects. 
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For the D format type, the complex number is stored in a C structure that has two 
doubl e members: real andimag. 

Many format characters have a similar form, differing with an appended pound sign 
(for example, s and s#). The second version of these formats works the same 
except that you supply two C variables, one to receive a pointer to the value and 
another to receive the length of the string: 

char * pStr; 
int 1 en; 

if (! PyArg_ParseTuple(pArgs , "s#", &pStr, &len)) 
return NULL; 

As shown in this example, you do not provide the storage for formats that give you 
a string value; PyArg_ParseTupl e just gives you a pointer to the string. The only 
exception to this rule is with es and es#, which convert values to Unicode using the 
encoding you provide (or the default encoding if encodi ng is NULL). For es, Python 
allocates a buffer for the encoded value, and it is your responsibility to free it (with 
PyMetTi_Free) when you’re finished. The es# format behaves a little differently if 
buffer is not initially NULL: You can create your own buffer and pass it in along 
with its maximum length. In both cases, the returned length will be the length of the 
encoded value. 

With the s format, the string passed to your function is N U L L-terminated, so it obvi- 
ously can’t contain embedded NULL characters. With the s# format, however, any 
Python string can be used. The z and z# formats work the same way except that the 
entire string may legally be None in Python, in which case your C pointer will be set 
to NULL. 

You can use the 0 format to get a pointer to a Python object instead of converting it 
to a C data type. The 0! format works the same except that you also supply a type 
argument so that your function receives objects only of a certain type (a 
TypeError is raised if the caller uses an incorrect type): 

PyObject * pObject; 

if (!PyArg_ParseTuple(pArgs,"0!", &PyList_Type, &pObject)) 
return NULL; 

The type names all follow the Py<Narrie>_Type convention; for example, 

Py I nt_Type, PyDi ct_Type, and so on. The S and U formats are shortcuts for 0! 
that ensure that the argument is a string or Unicode string. 

By using the 0& format, you can supply a conversion function for an object (which 
can be useful if you have to perform the same conversion in many places): 
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typedef struet // An internally-used structure 

I 

char * plP; 
unsigned short port; 

} Addr; 


// Converts an IP address and port to an Addr struet 

int addr_converter(PyObject *pObj , Addr *pAddr) 


return 

} 


PyArg_ParseTuple(pObj, "sh", &pAddr->pIP, 
&pAddr->port); 


static PyObject *sitnple_addhost(PyObject *pSelf, 

PyObject *pArgs) 

I 

char * pName; 

Addr newA; 

if (!PyArg_ParseTuple(pArgs,"s0&", &pNatne,addr_converter, 

&newA)) 

return NULL; 

printf("Added host %s (%s:%d)\n",pName,newA.plP,newA.port); 
return Py_Bui1dValue(""); 


Here’s the output of a call to s i tn p 1 e_a d d h o s t: 

>>> simple.addhost('Foo Corp.',('176.201.15.5 ' ,1234)) 

Added host Foo Corp. (176.201.15.5:1234) 

The conversion function should return 1 for success and 0 for failure, and should 
also raise an exception if conversion fails. 

Chapter 30 covers raising and handiing Python exceptions in C. 


Python doesn’t increment an objecfs reference count when it gives it to you via the 
O formats, but very often in C extension modules, you will have to keep track of ref¬ 
erence counts. The next chapter covers this in more detail. 

Using speciai format characters 

PyArg_ParseTupl e accepts a few speciai characters in its format string. The fol- 
lowing sections show you how to handle sequences and a variable number of argu- 
ments, and how to generate error messages when callers supply incorrect 
parameters. 
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Sequence unpacking 

Instead of calling a conversion function, you can use parentheses in your format 
string and PyArg_ParseTuple unpacks sequence arguments on the fly: 

int a,b,c,d; 

if (!PyArg_ParseTuple(pArgs, &a, &b, &c , &d)) 

return NULL; 

The Python call to this function would take three arguments, the second of which is 
a sequence: 

simple.somefunc(5 , (10,20), 8) 
simple.somefunc(0 , [1,2], 3) 

You can also nest sequences: 

char *a, *b, *c, *d; 

if (!PyArg_ParseTuple(pArgs, " (((ss )s )s )", &a, &b, &c , &d)) 
return NULL; 

The corresponding Python call would he as follows: 

simple.somefunc(((('This','is'),'really'),'ugly')) 


Optional and variable number arguments 

A pipe (I) character in the format list means that the remaining arguments to the 
function are optional. You should initialize the corresponding C variahles to their 
default values: 

i nt i, j = 15, k=20; 

if (!PyArg_ParseTuple(pArgs, "i|ii", &i , &j, &k)) 
return NULL; 

From Python, you could call this function in any of the following ways: 

simple.myfunc(10) 
simple.myfunc(10,15) 
simple.myfunc(10,15,20) 

You can use this method to create functions that handle a variahle number of argu¬ 
ments, but you do have to supply an upper bound on how many arguments the 
user can pass in. If you truly need to handle a varying number of arguments, you 
can avoid calling PyArg_ParseTupl e altogether and process the pArg variable 
using the abstract and concrete object layers described in the next chapter. 
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Error messages 

At the end of the format list, you can add a colon followed by a string to change the 
function name used if PyArg_ParseTupl e raises an exception: 

if (!PyArg_ParseTuple(pArgs, "iii:bleh", &i , &j, &k)) 
return NULL; 

Calling this function with the wrong number of arguments results in the following 
exception: 

tnyFunc( 1,2,3,4) 

Traceback (most recent call last): 

File "<stdin>", line 1, in ? 

TypeError: bleh() takes at most 3 arguments (4 given) 

Instead of a colon, you can use a semicolon followed by a string to be used as the 
error message: 

if (!PyArg_ParseTuple(pArgs, "iii;Doh!", &i , &j, &k)) 
return NULL; 

Now a call with the wrong number of arguments yields the following: 

myFunc(l,2,[5]) 

Traceback (most recent call last): 

File "<stdin>", line 1, in ? 

TypeError: Doh! 


Unpacking keyword arguments 

In order to handle keyword arguments, you first need to change the function’s entry in 
the module function table from METH_VARARGS to METH_VARARGS | METH_KEYWORDS: 

static PyMethodDef mymodule_methods[] = 


i"fune", mymodule_func, METH_VARARGS | METH_KEYW0RDS1, 
{NULL, NULLI // End of functions 


The C function takes a third parameter to hold the keyword arguments, and you call 
PyArg_ParseTupleAmdKeywords to unpack the arguments, passing in a list contain- 
ing the names of the arguments. The following example accepts three keyword argu¬ 
ments, one of which is optional: 

// Argument names 

static char *ppNames[] = {"name","age","weight",NULL); 
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static PyObject *sitnple_l<wd(PyObject *pSelf, PyObject *pArgs, 
PyObject *pKwds) 

{ 

char * pNatne; 
int age; 

i nt weight = -1; // weight is optional, so set a default 


if (!PyArg_ParseTupleAndKeywords(pArgs , 

ppNatnes, &pNatne, 


return NULL; 


pKwds, 
&age, 


"si I i" , 
&weight)) 


printf("Name: %s Age: %d Weight: %d\n", 
pNatne, age, weight); 
return Py_Bui1dValue(""); 


The format string must have an entry for each entry in the list of names (ppNames), 
and the list of names must end with a N U L L member. Following are some sample 
calls to this function: 

>>> si tnpl e. kwdf ' Bob ' , 5) 

Name: Bob Age: 5 Weight: -1 

>>> simple.kwdf age=10,name='Beabl e’) 

Name: Beable Age: 10 Weight: -1 

>>> simple.kwd('Fred',weight=150,age=25) 

Name: Fred Age: 25 Weight: 150 

Unpacking zero arguments 

If your C function takes no arguments, you should stili call PyArg_ParseTupl e with 
an empty format string to make sure no one calls your function incorrectly: 

if (!PyArg_ParseTuple(pArgs, "")) 
return NULL; 

Note There is also a utility macro, PyArg_NoArgs ( pArgs ), that does the same thing, 
but as of Python 2.1, it requires that the function's entry in the module function 
table use an obsolete form of argument passing, METH_0LDARGS. 


Converting C Data to Python 

The Py_Bui 1 dVal ue (format, . . . ) function does the opposite of 
PyArg_ParseTupl e, creating a Python object from C values. It is very common to 
use a call to this function when returning to Python from your C function. The fol¬ 
lowing example uses Py_Bui 1 d Va 1 ue to create a function that takes no parameters 
and returns a Python string object with the value ‘Helio’: 
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static PyObject *sitnple_hel1o(PyObject *pSelf, PyObject *pArgs) 
I 

i f (!PyArg_ParseTuple(pArgs,"")) 
return NULL; 

return Py_Bui1dValue("s","Hei 1o!"); 

I 


Cross- ^ Besides Py_Bui 1 dVal ue, you can use functions in the concrete object layer to 
Referenc^ convert from C data types. Chapter 30 covers functions such as 
-■— Py I nt_FrotnLong, which creates a Python integer object from a C long value. 

Creating simple Python objects 

Py_Bui 1 dVal ue takes a format string and the necessary C values to populate the 
Python object. Table 29-2 lists the characters you can use in the format string. 


Table 29-2 

Py BuiIdValue Object Types 

Format 

C Type 

Python Object 

1 

Int 

integer 

b 

Char 

integer 

h 

Short 

integer 

1 

Long 

integer 

f 

FI oat 

floating-point number 

d 

Double 

floating-point number 

C 

Char 

1-character string 

s or z 

char * 

string 

s# or z# 

char*, int 

string 

S 

PyStringObject * 

new string object 

0 

PyObject * 

object with reference count incremented 

0& 

converter , any 

new object passed through converter function 

N 

PyObject * 

object with reference count unchanged 

U 

Py_UNICODE * 

new Unicode object 

u# 

Py_UNICODE *. int 

new Unicode object 

u 

PyUnicodeObject * 

new Unicode object 
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s, z, and u take NUL L-terminated strings and convert them to Python strings; the 
forms that also take a length parameter can have embedded NULLs. 

With string conversion (for example, -s, s#, u), empty C strings convert to empty 
Python strings, and NULL pointers in C are returned as None in Python. Any time 
you pass a string or memory buffer to Py_Bui 1 dVal ue, it copies the data passed in, 
so it’s immediately safe to destroy whatever buffers you were using to hold your 
original copy of the data. 

Py_Bui 1 dVal ue raises PyExc_SystetnError and returns NULL if any problems 
occur during conversion. Likewise, a conversion function used with the 0& format 
should return a new Python object if possible, or raise an exception and return 
N U L L on error. 

Tip Uniike PyArg_ParseTupl e, you can add whitespace, colons, and commas to the 

format string for Py_Bui 1 dVal ue. They do not affect the value returned, but help 
improve C code readability. 

With an empty format string, Py_Bui 1 dVal ue returns the None object; with a single 
format specifier, it returns an object of that type; and with two or more, it returns a 
tuple containing the Python objects (this matches the behavior of normal Python). 
In order to force Py_Bui 1 d Va 1 ue to return a tuple containing 0 or 1 objects, wrap 
the formats in parentheses: 

Py_Bui1dValue("()"); // Creates an empty tuple 
Py_Bui1dVal ue("(i)",5); // Creates the tuple (5,) 

Tip A slightiy more efficient idiom for returning None is 

Py_INCREF(Py_None); 
return Py_None; 

Creating complex Python objects 

In addition to atomic Python objects, you can use Py_Bui 1 dVal ue to create 
sequence and mapping objects too. This function call creates a tuple containing a 
list and another tuple: 

// Creates ([5, 6, 7], Ca' , 'b')) 

Py_Bui1dValue("[i i i](cc)",5,6,7,'a' , 'b' ); 

You can nest sequences to create complex objects as needed: 

// Creates ([(!,), (2,), [3. 4]], (5, [6])) 

Py_Bui1dVal ue("[(i)(i)[i i]](i[i])M,2,3,4,5,6); 

Dictionaries are simple to make; each pair of C values form a key-value palr: 

// Creates {2: 2.5, 1: 'one'} 

Py_Bui1dValue("{i:s,i:f)",1,"one",2,2.5); 
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Embedding the Interpreter 

Instead of extending Python with C, sometimes it’s advantageous to extend a C pro- 
gram with Python. 

A simple example 

Once you have extension modules under your belt, embedding the Python inter¬ 
preter in a C program is a cinch: 

#include "Python.h" 

int mainCint argc, char ** argv) 

I 

Py_Initial i ze(); // Prepare the interpreter 
PyRun_Sirripl eStringC "pri nt 'Helio from Python !'\n"); 

Py_Finalize(); // Clean up resources used 

return 0; 

1 


The build steps are similar to those for extension modules: Modify the include and 
link paths to get the Python files, and link in Python’s library. Instead of creating a 
shared library, of course, your project or Makefile should create a stand-alone 
executable. 

With the exception of a few setup and threading functions, Py_I n i t i a 1 i ze () is the 
first Python API function that your program should use as it prepares the inter¬ 
preter for operation, which includes setting up built-in modules such as 

_bui 1 ti n _and_ tnai n _. Call Py_Fi nal i ze( ) to free the resources used by the 

Python subsystem; after Py_Fi nal i ze has been called, you need to call 
Py_I n i t i a 1 i ze again if you want to run more Python code without restarting your 
program. If your program is unsure of the current state, it can at any time call 
Py_IsIniti al i zed( )to check. 

PyRun_Sitnpl eStri ng is one of many functions you can use to actually execute 
the Python code; “Running Python Code from C,” later in this chapter, has more 
information. 

Caution Py Finalize does not unioad dynamically loaded extension modules; those 
stay around untii your program terminates. 

Shutting down 

At any time, you can call Py_Fatal Errorfchar *message) to prlnt an error mes- 
sage to stderr and kill the current process without performing any cleanup. The 
process exits with a call to the abort () function in the Standard C library, so on 
UNIX Systems it will attempt to create a core file. 
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For normal exiting, call Py_Exi t (i nt code ) to gracefully shut down the current 
process. Py_Exi t calls Py_Fi nal i ze first, and then calls the C exi t( code ) func- 
tion using the exit code you supply. 

Use Py_AtExi t (f unc ) to register a shutdown function that will be called by 
Py_Fi nal i ze. Your shutdown function should take no arguments and return no 
value. Py_AtExi t returns 0 if successful, or -1 if you try to register more than 32 
functions. Each shutdown function is called only once per call to Py_Fi nal i ze, and 
they are called in the opposite order in which they were registered (LIFO). 

Py_Fi nal i ze does all of its own cleanup work before calling the shutdown func¬ 
tions, so your functions should not use Python/C API calls. 

Other Setup functions 

By default, the program’s name (the value of argv[0]) is ‘Python’, but you can 
change that with a call to Py_SetProgratTiNatTie(char *natTie), which must be called 
before Py_I n i t i a 1 i ze. The program’s name is used internally to help locate run- 
time libraries. Py_SetPrograniNatTie does not copy the string but keeps a pointer to 
it. You can call Py_GetProgratTiNatTie ( ) to get this value. 

Use PySys_SetArgv (i nt argc , char **argv ) to set the command-line parameters 
for the Python interpreter (sys . argv). This call must follow Py_Ini ti al i ze, and 
in current versions, if you don’t call PySys_SetArgv, the sys module will not have 
an a rg V member at all. 

Py_SetPythonHorrie( char *) lets you programmatlcally overrlde or set the value of 
the PYTHONHOME environment variable. Use Py_GetPythonHotTie () to retrievethe 
current value, which is empty by default. 

System information functions 

Many functions return information about the program’s operating environment; this 
section describes the more useful ones. Note that these are not specific to embed- 
ded interpreters but can also be used from extension modules. 

Py_GetProgratTiFul 1 Path ( ) returns a pointer to a string representing the complete 
path to the currently running executable (either the normal Python interpreter or 
an application that embeds the interpreter). 

To access the default module search path, call Py_GetPath(). The returned 
pointer refers to a list of paths from sys . path, separated by the System path delim- 
iter character (for example,:on UNIX). Although you can modify the list from Python 
via sys.path, do not modify the value returned from Py_Ge t Path. 

Py_GetVersi on ( ) returns a pointer to a string showing the version of the Python 
interpreter: 
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2.1 (#9, Jan 1 2001, 02:49:28) [MSC 32 bit (Intel)] 

This is the same string displayed when starting an Interactive Python session, and 
is accessible from Python as sys . versi on. 

Py_GetPl atf orrri( ) returns a platform identifier string such as win32 or freebsd4. If 
Python can’t determine the platform name, this function returns the string 
“unknown.” Python code accesses this value as sys . pl atf ortn. 

Py_GetPref i x( ) returns the path prefix for installed platform-independent files, 
and Py_GetExecPref i x( ) returns the path prefix for installed platform-dependent 
files. For example, if the program name is /usr/1 ocal /bi n/python, the prefix is 
/usr/local, although the values are actually calculated based on the program 
name and environment variables. These values are available from Python as 
sys . pref i x and sys . exec_pref i x. On UNIX, they refer to the - -pref i x and - - 
exec-pref i x Makefile settings; on Windows, they are empty. 


Running Python Code from C 

The abstract object layer covered in the next chapter has functions, such as 
Py0bject_Cal 1 Function, that let C extension functions call Python functions 
directly, just like a normal function call in a Python program. In some cases, how- 
ever, you might not need such direct, low-level communication between Python and 
C. If your C program just needs to execute some Python code without much interac- 
tion with the Python interpreter, you can use the functions listed in this section 
instead. 

As shown in the previous sectlon’s example, Py Run_Si rripleString(char *com- 
mand) executes a string containing one or more lines of Python code. The function 
returns 0 if successful and -1 if an unhandled exception was raised, although 
there’s no way to retrieve Information about what exception it was. 

PyRun_SimpleString runs the code in the_ mai n _module, creating the module 

first if needed. 

PyRun_Sinipl ePi 1 e( FI LE *f , char *fnatTie) works just like 

PyRun_Simpl eStri ng, except that it uses the contents of the file f as the code to 

execute. f name is the name of the file being use. 

PyRun_Interacti ve0ne( FI LE *f, char *fname) waits for and then executes a 
single statement from f, which is a file representing an Interactive device. fname is a 
name to be used when printing out error messages. 

PyRun_InteractiveLoop(FILE *f, char *fname) repeatedlycalls 
PyRun_I nteracti veOne until the end of file is reached. The following code creates 
an Interactive interpreter, somewhat similar to the one you get when you load the 
Python executable in Interactive mode: 
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#include "Python.h" 

int mainCint argc, char ** argv) 

I 

Py_Initialize(); 

Py_Exit(PyRun_InteractiveLoop(stdin,"<stdin>")); 


PyRun_Any Fi le(FILE*f, char*fnatne)isa utility function that calls 
PyRun_Interacti veLoop if f is attached to an Interactive device, and 
PyRun_Simpl eFi 1 e if it is not. This function uses Py_FdI sl nteracti ve ( FI LE *f, 
char *fnarrie ) to decide which to call. 

If you have a block of code that you intend to execute many times, you can parse it 
a single time and create a code object that Stores the code in its ready-to-execute 
form so that later executions will he quicker. PyParser_Simpl eParseStringI char 
*command, intstart) parses code from astring in memory, and 
PyParser_Si mpleParseFile(FILE*f, char*fname, intstart) parses code 
from the file you provide. The sta rt parameter is used to teli what sort of code it’ll 
he parsing; legal values are descrihed in Tahle 29-3. 


Table 29-3 

Grammar start Codes 

Code 

Use if the Python code to parse is... 

Example 

Py_evaLinput 

an isolated expression 

x*6 

Py_singlejnput 

a single statement 

print blue 

Py filejnput 

a sequence of statements 

(an entire program) 


Both of these functions return a pointer to a newly allocated node structure, which 
you can then convert into a code ohject: 


#include "Python.h" 
#include "compile.h" 
#include "node.h" 


PyCodeObject *co; 
node * n; 

n = PyParser_SimpleParseString("pri nt 'Helio'", 

Py_single_input); 

co = PyNode_Compi1 e(n, "<stdin>"); 

PyNode_Free(n); 
i f (c 0) 
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... // Do some work here 

Py_DECREF(co); 

} 


For ashortcut that does the same thing, call Py_Corripi leString(char *ctnd, char 
*fname, intstart), which returns a new reference to acode object. This example 
creates a code object that prints ‘Helio’, and then executes the code object 10 times 
(it uses functions you won’t learn until the next chapter, so don’t worry if it looks a 
little strange): 

#include "Python.h" 

#include "cotnpi 1 e . h" 

#include "eval.h" 


int mainCint argc, char ** argv) 


PyObject *co, *tn; 

Py_Initialize(); // 
m = Py Irriport_AddMod 
co = Py_Cotnpi 1 eStri 

i f (co && tn) 


Setup 

le("_tnain_"); // 

g("print 'Hei 1o ’", 
Py_single_input) 


Force creation of main 

"<stdin>", 


i nt i ; 

PyObject *d = PyModul e_GetDict(tn); // Get main dictionary 
// Repeatedly execute the code object 

for (i = 0; i < 10; i++) 

{ 

PyObject * res = PyEval_EvalCode((PyCodeObject *)co, 

d, d); 

Py_XDECREF(res); 

) 


Py_DECREF(co); // We're done with this object! 

) 


Py_Exit(0); 

1 


If you only need to evaluateastring, use the PyRun_String(char*cmd, int 
start, PyObject *gl obal s , PyObject *1 ocal s ) function, which returns a new 
reference to a Python object containing the resuit of running the code. gl obal s and 
1 ocal s are Python objects that reference the global and local dictionaries in which 
to run the code. The following example takes a Python expression as a string, evalu- 
ates it, and converts the resuit to a C integer: 

#include "Python.h" 

int mainCint argc, char ** argv) 

{ 
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PyObject *tn, *d, *result; 
char * ctnd = "2 * 11"; 

Py_Initi al i ze(); // Set up 

m = PyIrriport_AddModul e("_tnain_"); 

d = PyModul e_GetDi ct(tn); // Get dictionary to use 

// Evaluate it and get a PyObject back 

resuit = Py Run_Stri ng (ctnd , Py_eval_i nput, d, d); 

// Convert the PyObject to something chicken and print it 

printf("%s is %d\n", ctnd ,(i nt) PyInt_As Long (resui t)); 

Py_Exit(0); 

1 


The resuit is printed on stdout: 

2 * 11 is 22 

Finally, if your input is coming from a file, you can do the same thing with Py Run_Fi 1 e 
(FILE*f, char*fnatne, intstart, PyObject *gl obal s , PyObject *1 ocal s). 

Tip PyRun_Any Fi 1 e, PyRun_Sitnpl eFi 1 e, and PyRun_Fi 1 e all have extended ver- 

^ sions (for example, PyRun_AnyFi 1 eEx) that take an integer third parameter, 

^ which, if non-zero, telis the function to close the file descriptor when finished. 


Using Extension Tools 

Writing code to create the interface between Python and C is generally very 
straightforward and, therefore, often boring. After youVe been spoiled by develop- 
ment in Python, you may find that your extension modules have bugs because you 
have to manually manage object reference counts. Fortunately, several popular 
tools are available to help you automate these tasks. 

Tip In addition to the tools mentioned here, the Vaults of Parnassus have others that 

you can try too. Visit the Python Tools/Extensions section at http : / /www. vex. 
net/parnassus. 


SWIG 

The Simplified Wrapper and Interface Generator (SWIG) is a development tool 
designed to connect C and C++ programs with high-level languages, including, but 
not limited to, Python, Perl, and Tcl/Tk. It is especially useful for creating an exten¬ 
sion module from an exlsting C or C++ library. In some cases, it can generate all the 
interface code automatically. SWIG is free for commercial and private use and is 
available from www. swi g. org. 
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Some areas in which SWIG shines include the following: 

-f Wrapping existing libraries (your own or third-party libraries) 

-f Rapid prototyping and application development 

-f Interactive debugging (use your library in an interactive Python session) 

-f Regression testing (Python Scripts that test your C/C++ code) 

Creating a GUI front-end in Python for an underlying C program 

Using SWIG 

To use SWIG, you create an interface file that lists the variables and functions you 
want to be available to Python. The format of the file is very C-like (it’s read by a C 
preprocessor), and in some cases you can even use your source code as the inter¬ 
face file itself. 

Once the interface file is ready, you run SWIG to generate the wrapper code. You 
then compile the wrapper code and link in your original C code, and you end up 
with a ready-to-use Python module. 


A SWIG example 

SWIG has many features, but the following simple example gives you an idea of 
what it does. Suppose I want to create (or have already created) a C library called 
usel ess and I want to be able to access its powerful features from Python. The 
source code is in usel ess . c: 

#include "stdio.h" 

int getnumC) 

I 

return 42; 

1 

void messagef) 

I 

printf("Hei 1 0 , SWIG!\n"); 

) 

i nt addetn( i nt j , i nt k) 

I 

return j + k; 

) 


The next step is to create the interface file called usel ess . i: 

Xtnodule useless 
Xinclude useless.c 
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The Xtnodul e directive says that the finished module will be called useless. The rest 
of the file contains C variable and function declarations; because the original 
source code is clean, 1 decide to pass it all to SWIG verbatim. Alternatively, I could 
have used the following: 

%rriodule useless 

int getnumf); 

void tnessageC); 

int addemdnt j, int k); 

Often, the second form is what you’ll use, because you might not want every library 
function exported to Python, and the interface file lets you add features specific to 
the Python verslon of your library 

The next step is to run SWIG and generate the wrappers: 

/hotne/dave/swig> swig -python useless.i 
Generating wrappers for Python 

SWIG creates usel ess_wrap. c and usel ess_wrap . doc (a documentation file). The 
-python argument selects Python as the output language. Now it’s time to build the 
module: 

home/dave/swig> gcc -shared useless.c useless_wrap.c -o \ 
usel esstnodul e. so -1/usr/1 ocal/i ncl ude/python2.0 \ 

-DHAVE_C0NFIG -1/usr/1ocal/1 ib/python2.0/config 

If you do this more than once, you’ll obviously want to wrap this into a Makefile. 

The SWIG Web site also has instructions for using Microsoft Developer Studio. 

The new module is complete. Here’s a test: 

>>> import useless 
>>> dir(useless) 

[’_doc_'_file _'_natne_ 'addetn', 

’ getnutn' , ' tnessage ' ] 

>>> useless.addem(10,5) 

15 

>>> useless.messagef) 

Helio, SWIG! 

>>> useless.getnumf) 

42 

Other nifty features 

SWIG Works with both C++ classes and templates; but for some C++ features, you 
have to put forth a little extra effort to get them to work (it’s all well documented, 
but a little less intuitive). SWIG does a great job of making the common case fast 
(the creators of SWIG cite the example of creating a Python module for OpenGL in 
15 minutes or so); more complex C++ features require more work to make them 
callable from Python. 
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With structures and classes, the Python equivalents become C1 assnanie_tnethod- 
name. The pn' nt (tnsg ) method in a List class would be Li st_pri ntdnstance, 
msg ), which is sort of klunky, but SWIG’s shadow command line option has it cre¬ 
ate a shadow class that makes the Python version easier to use; for example, 

myList.print(msg). 

Your interface file can also implement the special methods that Python classes use. 

For example, you could implement in C a_ geti tem _method to handle attribute 

access. 

One final feature worth mentioning here is a typemap, which automatically handles 
conversion to and from Python data types when calling your C functions. For exam¬ 
ple, if you have aCwriteToFile function that takes a C FILE pointer, you can cre¬ 
ate a typemap that converts to and from FILE pointers and Python file objects. 
Then, without changing the original C code, Python routines can pass in file objects 
to any C functions that expect FI LEs. 


cxx 

CXX is a set of C++ facilities that helps you write Python extensions easily and with 
fewer bugs. You can download CXX from http: //cxx. sourceforge .net. CXX has 
two main parts: CXX_Objects and CXX_Extensions. 

INote SCXX (Simplified CXX) is another Python/C-i-i- API library that is also free to use for 

' commerciai and private applications. Its main purpose is to wrap Python objects 

and manage reference counts, and it stays away from C-t-i- features found oniy in 
newer compilers. Visit http : / /www .mcmi 11 an - i nc. com for more information. 

CXX_Objects 

The main idea behind CXX_Objects is that too much of the work of writing Python 
extension modules deals with checking return codes for errors and managing refer¬ 
ence counts, and that using the Standard Python/C API is too low-level. 

CXX_Objects is a set of high-level C++ classes (named Float, Tuple, String, Dict, and 
so on) that wrap their Python counterparts. Their constructors and destructors 
keep track of reference count details; and as a group, they use C++ exceptions to 
signify error conditions and as a cleanup mechanism. In short, it makes writing C++ 
extension modules cleaner by using the features that make C++ a higher-level lan- 
guage than C. 

Because you rarely use PyObject pointers directly, and instead use high-level wrap- 
per objects, your extension module code is more “Pythonic,” less buggy, and easier 
to maintain. 

Unhandled Python API errors or uncaught CXX exceptions are automatically con- 
verted to Python exceptions and passed back to the caller, although you always 
have the option of handling the exceptions yourself. 
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CXX_Extensions 

CXX_Extensions is a more recent addition to CXX. As with CXX_Objects, the motiva- 
tion behind it is that the Python/C API way of doing things can be improved upon 
by using the features of C++. 

A garden-variety C extension module has numerous static functions and a single 
public initialization function that is called when the module is imported. The init 
function passes back to Python a table that has pointers to the various functions 
implemented by the module. 

The CXX_Extensions approach is that all extension modules are actually C++ 
classes that are derived from ExtensionModule (a base class template). Each func¬ 
tion in your module is implemented as a method of that class. 

CXX_Extensions also includes PythonExtension, a C++ class from which you derive 
new Python extension types. Unlike other objects, PythonExtension objects can be 
created either on the Python heap or in automatic (stack) storage. Creating and 
destroying objects on the heap can be relatively expensive, so programs that 
can create and use PythonExtension objects created on the stack enjoy better 
performance. 

Extension classes 

Although you can create new Python types in a C extension module, Python types 
in general aren’t very object-oriented (you can’t directly subclass floating point 
numbers, for example). Digital Creations (the maker of Zope) has created extension 
classes, which are Python extension types that look and act as if they were really 
classes. 

With extension classes, you can create an extension base class in C or C++ and then 
subclass it in Python. As with normal classes, it is trivial to create new instances of 
them, and they even work with multiple inheritance (for example, a Python sub¬ 
class is derived from both a C extension class and a Python class). 

One advantage of extension classes is that instance data can be stored in a dictio- 
nary as usual, as instance data in a C++ object, or some combination of the two. You 
could have a few special members stored in a C struet for performance reasons, for 
example, and let the other object attributes remain in the instance dictionary. 

Extension classes also enable you to invoke unbound methods (unlike normal 
Python classes, for which you need an instance object in order to use them). 

You can download extension classes from www. di gi cool . cotn. 
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Summary 

This chapter introduced what you need to know to begin using Python with C. By 
reading this chapter, you learned how to: 

-f Write a complete C extension module. 

-f Pass data from Python and convert it to C data types. 

-f Return data from a C extension to a Python program. 

Use popular third-party packages to automatically generate Python-C inter- 
face code. 

-f Embed the interpreter in a C program. 

Chapter 30 finishes our coverage of the Python/C API. In it, you’ll learn about the 
different object layers available and how you can properly handle and report errors 
from C. 
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Embedding the 

Python 

Interpreter 

T his chapter is the second of a two-part look at the 

Python/C API. Whereas the previous chapter introduced 
the concepts of using Python and C/C++ together, this chapter 
is a reference for the functions used to manipulate Python 
objects from C. It also deais with other issues, such as error 
handiing and C memory management. 


Tracking Reference Counts 

Each Python ohject always knows how many variahies refer¬ 
ence it; and when there are no remaining variahies, the ohject 
magically goes away. To remind you how much you enjoy pro- 
gramming in Python, when using the C API, you have to do 
some of the reference counting yourseif. Each time you use a 
PyObject pointer (or one of its subtypes), you need to track 
the type of reference ownership that goes along with that 
ohject pointer. There’s nothing in the code itself that contains 
this information; the Python/C API has a few documented 
terms and conventions that act as guidelines. 

Types of reference ownership 

Suppose you have a PyOb j ect pointer named x. You use the 
Py_INCREF(x) macro to tell Python, “Hey, Em using the 
ohject pointed to by x; I need a new reference to it”; and you 
use Py_DECREF(x ) to say, “Em done with (my reference to) x.” 
If you don’t do this, it’s quite possible that somewhere eise 
the last reference to x is released, the ohject is cleaned up, 
and you’re left with a pointer to memory that has aiready 
been freed. Your well-behaved C extension module suddeniy 
becomes the cause of strange crashes and other eviis. 
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The first type of reference is an owned reference. If your block of code owns a refer- 
ence to an object, then it’s safe to store the object pointer, for example, because the 
object won’t be destroyed at least until you release your reference to it. One way to 
gain a reference is to call Py_I NCREF, another is when a function returns to you a 
new reference. 

You can cease to own a reference by calling Py_DEC RE F or by transferring the refer¬ 
ence to someone else. When you own a reference and call a function that assumes 
ownership of that reference, that function is said to steal the reference from you. 

The second type of reference is a borrowed reference. Because you’re using C point- 
ers to Python objects, it’s possible to pass objects to another function that doesnT 
modify the reference counts at all. If this is intentional, such a function is said to 
borrow references to objects (otherwise, it’s a bug). One case in which this is safe is 
in a function that uses the object pointer to perform some operation and then 
returns control to the caller without storing the pointer or modifying the reference 
count. If the caller owns a reference, then the object is guaranteed to exist until pro- 
gram control is returned to the caller. The rule here is that the borrower can’t use 
the reference any longer than the true owner does. To change a borrowed reference 
to an owned reference, simply call Py_INCREF. 

Note Although these references are called borrowed references, they aren't really bor- 
rowed at all, as the caller stili owns the reference. Another way to think of it is that 
the reference owner gives permission to another function to access the object 
temporarily. 


To sum it up, you become an owner of a reference by calling Py_I NCREF or by 
receiving a new reference from someone else. There are two legal ways to rid your- 
self of an owned reference: decrease the reference count or give the reference to 
someone else. Anythlng else is a reference leak and, in turn, a memory leak (and 
potentially a leak of other resources). 


Tip 



Py_XINCREF and Py_XDECREF modify reference counts, but first check to make 
sure the object is non-NULL. 


Reference conventions 

The Python/C API documentation specifies whether references are borrowed or 
owned (although occasionally it doesnT hurt to look at the function source code 
just to convince yourself). 


As a general rule of thumb, the API functions that return some type of PyOb j ect 
pointer return a new reference to that Python object. For example, many C exten- 
sion functions return with a call to Py_Bui 1 dVal ue; it returns a new reference to 
you, which you pass on to the caller of your function. The main exceptions to this 
rule (weTl remind you again later) are PyTupl e_GetItem and Py Li st_GetIterri, 
which return borrowed references to tuple and list items. 
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When you pass object references to functions, those functions generally borrow the 
references (if they want a reference, they’ll call Py_I NCRE F). The main exceptions 
are PyTupl e_SetIterri and Py Li st_SetIterri, which steal references from their 
callers. 

Along the same lines, C functions called from Python borrow references to their 
arguments because the objects are basically guaranteed to exist until the function 
returns. If you need to store a pointer to an object, however, be sure to grab your 
own reference to it. 

Common pitfalis 

Usually, tracking reference counts isn’t too much hassle, but two very common but 
subtle bugs are confusing enough to warrant mention here. Don’t be surprised if 
you run into these exact bugs or close variants. 

One common mistake occurs in multithreaded programs. There is a global lock (dis- 
cussed in the “Creating Threads and Sub-Interpreters” section later in this chapter) 
that the current thread must hold in order to operate on Python objects. Before 
potentially long operations (a blocking socket call, for example), it is customary to 
release this lock so that other threads can do work. Problems arise if the other 
threads end up deleting the last reference to an object to which you have only a 
borrowed reference. In this case, when you regain control of the global lock, your 
once valid object has now been deleted. The solution is to increment the reference 
count before releasing the lock and decrement it on return — even if no other refer¬ 
ences remain, your owned reference will stili exist. 

A similar problem occurs when an objecfs reference count is implicitly decre- 
mented. Calling PyLi st_SetItetn, for example, puts an object in a list (such as the 
Python code a[5] = 'hello'). The object originally in that position is replaced, its 
reference count is decremented by 1, and, if that was the last reference, the object 
is deleted. Any borrowed references you have to that object (perhaps from a call to 
PyLi st_GetItem) may now be bogus. Again, the solution is to explicitly get a new 
reference to the object so tbat you can guarantee tbat it is not destroyed too soon. 

Using the Abstract and Concrete Object 
Layers 

The Python/C API contains functions to perform all possible operations on Python 
objects. The functions used to manipulate objects are organized hierarchically 
according to object type, and it has two main layers: abstract and concrete. 
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Object layers 

The abstract object layer is an API layer that enables you to work with general cate- 
gories of objects. For example, you can call PyNurriber_Add to add two numbers 
without worrying too much about their exact type. The abstract layer also has func- 
tions for dealing with sequence types and mapping types. 

The concrete object layer\\a.s functions specific to each type of object. Py FI oat_Check 

checks to see if an object is a floating-point number, for example; and 

PyCompl ex_FrotTiDoubl es creates a complex number object from two C numbers. 

Cross- A In general, API functions that return a pointer use NULL to denote an error, and 
Referenc^ those that return an integer use -1 for errors. In both cases, the functions also set 
an error condition so that the Python caller ends up with an exception. See 
"Handiing Errors and Exceptions" later in this chapter for more information about 
error handiing. 

If you have a choice between functions in the abstract and concrete object layers, 
use the more general abstract functions, for greater flexibility. 

Working with generic objects 

At the top of the hierarchy is a group of general-purpose functions for working with 
any Python object. Table 30-1 lists common object operatlons in Python and their 
corresponding C function call. Calis that return PyObject polnters return a new ref- 
erence unless otherwise noted. 



Table 30-1 

General Object Functions 

Python 

Equivalent C Function Call 

repr(o) or 'o' 

PyObject *PyObject_Repr( PyObject *o) 

str(o) 

PyObject *Py0bject_Str(PyObject *o) 

PyObject *Py0bject_Uni code(PyObject *o) 

1 en( 0 ) 

int PyObject_Length( PyObject *o) 

hasattr( 0 , name) 

i nt PyObject_FlasAttrString( PyObject *o, char*name) 

i nt PyObject_FlasAttr( PyObject *o, PyObject *name) 

getattr(o, name) 

PyObject *PyObject_GetAttrStri ng( PyObject *o, char 
*name) 

PyObject *PyObject_GetAttr(PyObject *o, PyObject 
*name) 
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Python 

Equivalent C Function CaU 

0 . name = v 

i nt PyObject_SetAttrString(PyObject*o, char 
*name,PyObject *v) 

i nt PyObj ect_SetAttr( PyObject *o, PyObject*name, 
PyObject *v) 

dei 0 .name 

i nt PyObject_Del AttrStrC PyObject *o, char*name) 

i nt PyObject_Del Attr( PyObject *o, PyObject *name) 

cmp(ol, o2) 

i nt PyObject_Compare( PyObject *ol, PyObject *o2) 

i ntPyObject_Cmp(PyObject*ol, PyObject*o2, int 
*result) 

int PyObj ect_Ri chCompa re( Py Object *ol, PyObject *o2, 
int op) 

o[key] 

PyObject *PyObject_GetItem(PyObject *o, PyObject 
*key) 

o[key] = val 

i nt PyObject_SetItem( PyObject*o, PyObject *key, 
PyObject *val) 

dei o[key] 

i nt PyObject_Del Item( PyObject *o , PyObject *key) 

pri nt >> fp, ' o' 

i nt PyObject_Print(PyObject*o, FILE*fp, 0) 

pri nt >> f p, 0 

i nt PyObject_Print(PyObject*o, FILE*fp, 

Py_PRINT_RAW) 

type(o) 

PyObject *PyObject_Type(PyObject *o) 

hash(o) 

i nt PyObject_Elash(PyObject *o) 

not not 0 (is o true?) 

int PyObject_IsTrue(PyObject *o) 

cal1 abie(o) 

int PyCal 1 abi e_Check( PyObject *o) 


The PyObject_Ri chCotnpare function compares two objects using the comparison 
you specify in the op parameter. If neither object supports the necessary compari¬ 
son function, Python compares them using its own methods. The op parameter can 
be any of the following global variables that correspond to the rich comparison 
function names: 

Py_LT 

Py_LE 

Py_EQ 

Py_NE 

Py_GT 

Py_GE 
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To compare using an objecfs_11_function, for example, you’d call 

PyObject_Ri chCompare with an op of Py_LT. As of Python 2.1, the 

PyObject_CotTipare function checks for the presence of rich comparison functions 

before using Python’s default comparison functionality. 

PyObject_Ri chCompare is new in Python 2.1. 


You can read more about the rich comparison functions in Chapter 7. 


The equivalent ofapply(o, args) oro(*args) is PyObject_Cal 1 Object 
( PyObject *o, PyObject *args ), where args can be NULL if a function takes no 
arguments. PyObj ect_Cal 1 Ob ject returns a new reference to an object containing 
the function call resuit. 

PyObj ect_Cal lFunction(PyObject*o, char*format, ...) works the same 
way except that you use a Py_Bui 1 d Va 1 ue-like format string to specify the argu- 
ment types, or a NU LL format string to denote no arguments. When calling methods 
of instance objects, use PyObject_Cal lMethod(PyObject*o, char *method , 
char*format, ...). Note that you can’t call special methods (for example, 

_add _ ') this way; the API provides individual functions for calling those methods 

(for example, PyNumber_Add). 

Tip Even if there isn't a public C API for a particular Python method, you can stili call it 

using these functions. For example, mapping objects (for example, dictionaries) 
have an i tems () method to return a list of key-value pairs, so you couid invoke it 
as follows: 

PyObj ect_Cal1Method(0items",NULL) 

PyObject_As Fi 1 eDescri ptor (PyObject *o ) is a utility function for gettlng an 
integer file descriptor from an object. If the Python object is an integer or long num- 
ber, it returns Its value. Otherwise, it returns the resuit from calling the objecfs 
fi 1 eno () method, if present. 


Working with Number Objects 

The abstract object layer has the Py Number family of functions for dealing with any 
numerical object; and the concrete layer has functions specific to integers, long 
integers, floating-point numbers, and complex numbers. 

Any numerical type 

Use PyNumber_Check( PyObject *o) to determine whether a particular Python 
object supports numerical operations. 
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Table 30-2 lists numerical operations in Python and their Python/C API equivalents. 
As usual, PyObject pointers returned from functions represent a new reference to a 
Python object, or are NULL to indicate an error. 



Table 30-2 

Numerical Functions 

Python 

Equivalent C Function Call 

a + b 

PyObject *PyNumber_Add(PyObject *a, PyObject *b) 

a - b 

PyObject *PyNumber_Subtract(PyObject*a, PyObject 
*b) 

a * b 

PyObject *PyNumber_MultiplyCPyObject *a, PyObject 
*b) 

a / b 

PyObject *PyNumber_Divide(PyObject *a , PyObject *b) 

a % b 

PyObject *PyNumber_Remainder(PyObject *a , PyObject 
*b) 

divmod(a , b) 

PyObject *PyNumber_Divmod(PyObject*a, PyObject *b) 

-a 

PyObject *PyNumber_Negati ve( PyObject *a) 

+a 

PyObject *PyNumber_Positive(PyObject *a) 

--a 

PyObject *PyNumber_Invert(PyObject *a) 

abs(a) 

PyObject *PyNumber_Absolute(PyObject *a) 

a << b 

PyObject *PyNumber_Lshift(PyObject *a , PyObject *b) 

a >> b 

PyObject *PyNumber_Rshift(PyObject *a, PyObject *b) 

a & b 

PyObject *PyNumber_And(PyObject *a, PyObject *b) 

a 1 b 

PyObject *PyNumber_Or(PyObject *a , PyObject *b) 

a '' b 

PyObject *PyNumber_Xor(PyObject *a , PyObject *b) 

i n t (a ) 

PyObject *PyNumber_Int(PyObject *a) 

1ong(a) 

PyObject *PyNumber_Long(PyObject *a) 


float(a) PyObject *PyNumber_Float(PyObject *a) 

a , b = coerceC a , b) int PyNumber_Coerce( PyObject **a , PyObject **b) 


The Python pow( a , b, c ) function is accessible as PyNutTiber_Power (PyObject *a 
PyObj ect *b , PyObj ect *c ). The third parameter, c, can be a Python number 
object or Py_None. 
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For many of the functions in Table 30-2, there are corresponding functions for the 
in-place version of the same operation. For example, PyOb j ect* 
PyNutTiber_InPlaceLshift (PyObject*a, PyObject*b)is the C way of doing a 
<<= b in Python. 

Integers 

Python integer objects are represented in C by the PylntObject structure. 
PyInt_Check(PyObject *o) returns 1 ifthegiven object is an integer object (it 
has the type Py I nt_Type). 

Py I nt_FrorriLong( 1 ong val ) takes a C 1 ong integer and returns a new reference to 
aPython integer object. PyInt_AsLong( PyObject *o) converts an integer object 
back to a C 1 ong, coercing the object to a Python integer object first if needed. If 
you already know that it is an integer, you can use the Py Int_AS_LONG( PyOBject 
*o) macro to do the same thing, but without coercion and error checking. 

The largest value that can be stored in an integer object is defined as L0NG_MAX in 
the header files; you can use Py I nt_GetMax( ) to retrieve this value. 

Longs 

Python long integers are stored in Py LongObject structures, their type is 
Py Long_Type (this is the same as types . LongType in Python), and you can call 
PyLong_Checl<( PyObject *o ) to test whether an object is a long number object. 

PyLong_FrotTiLong(long val ), PyLong_FrotTiUnsignedLong(unsigned long val ), 
and PyLong_FrorriDouble(double val ) return a new reference to a long Integer 
object having the given value. 

One other way to create a new Python long object is by passing a character string 
to Py Long_FrorriStri ng(char*str, char**end, intbase). The base or radix of 
the number is specified by the base argument; values can be in the range from 2 to 
36, or 0, which means that the function should look at the string itself to determine 
the base. It will use base 16 if the string starts with Ox or OX, base 8 if it starts with 
0, and base 10 otherwise. Py Long_FrotTiStri ng ignores leading spaces, and Stores 
the position of the first character after the end of the number in end if it is not 
NULL. 

PyLong_AsLong(PyObject *o) and PyLong_AsUnsignedLongC PyObject *o) con- 
vert long Integer objects to C long and unsigned long variables. Because Python 
long integers can be any size, values that cannot be converted to C cause an 
OverflowError exception to be ralsed. PyLong_AsDoubl e( PyObject *o) returns 
the value of a long integer inaCdouble. 
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Floating-point numbers 

Floating-point numbers are stored in PyFl oatObject structures; their type is 
Py FI oat_Type and you can ensure that an object is a floating-point number by call- 

ing PyFloat_Check(PyObject *o). 

Py FI oat_FromDoubl e( doubl e val ) returns a new reference to a Python floating- 
point number object. 

Given a Python floating-point number, you can convert it to a C doubl e by calling 
PyFl oat_AsDoubl e( PyObj ect *o). This function has some overhead due to error 
checking, so if you are already sure your object is a floating-point number, you can 
just call PyFl oat_AS_DOUBLE( PyObj ect *o). 

Complex numbers 

Python complex numbers live in PyCotnpl exOb ject structures and have the type 
PyCotnpl ex_Type (equivalent to types . Compl exType in Python). 

PyCotnpl ex_Check( PyObject *o) returns 1 if the given object is a Python complex 
number. 

To create a complex number object, call PyCotnpl ex_FrotnDoubl es (doubl e real , 
doubl e itnag ) to specify its real and imaginary components. You can also use 
PyCotnpl ex_FrotnCCotnpl ex( Py_cotnpl ex *c). Py_cotnpl ex is a C structure 
declared as follows: 

typedef struet { 
double real ; 
double itnag; 

} Py_cotnplex; 

Given a Python complex number object, you can extract its real and imaginary 
parts by calling PyCotnpl ex_Real As Doubl e( PyObject *o) and 
PyCotnpl ex_ItnagAsDoubl e( PyCotnpl ex *o ). Both functions return a C double. You 
can also call PyCotnpl ex_AsCCotnpl ex( PyObject *o ) to place the values into a 
Py_cotnpl ex structure. 


Working with Sequence Objects 

The functions in this section enable you to manipulate Python objects that are lists, 
tuples, and strings. 

When using functions that return slices, keep in mind that generally what is 
returned to you is a new reference, and that each item in it had its reference count 
incremented before it was sent back to you. 


562 Part V > Advanced Python Programming 


Any sequence type 

These functions are part of Python’s abstract object layer and work on any 
sequence object. PySequence_Check( PyObject *o) returns a nonzero value ifthe 
object supports sequence functions. 

Table 30-3 lists C function calls for corresponding sequence operations in Python. 
Functions that return a PyObject pointer return a new reference of that object or 
N U L L if an error occurs. Those that return integers use a value of -1 to denote failure. 



Table 30-3 

C Sequence Functions 

Python 

Equivalent C Function Coii 

1 en(s) 

int PySequence_Length( PyObject *s) 

s[i ] 

PyObject *PySequence_GetItem( PyObject *s . i nt i) 

s[a:b] 

PyObject *PySequence_GetSl i ce( PyObject *s , int a, intb) 

s[i ] = V 

i ntPySequence_SetItem(PyObject*s, int i, PyObject*v) 

s[a:b] = V 

i nt PySequence_SetSl i ce(Py Objectas, int a, intb, 

PyObject *v) 

dei s[i ] 

i nt PySequence_DelItem(PyObject *s, int i ) 

dei s[a:b] 

i nt PySequence_DelSlice(PyObject *s , int a , int b) 

sl + s2 

PyObject *PySequence_Concat( PyObject *sl, PyObject *s2) 

sl += s2 

PyObject *PySequence_InPlaceConcat(PyObject *sl, 

PyObject *s2) 

s * count 

PyObject *PySequence_Repeat(PyObject *s, int count) 

s *= count 

PyObject *PySequence_InPlaceRepeat(PyOBject *s, int 
count) 


s.count(v) i nt PySequence_Count(PyObject *s , PyObject*v) 

V in s int PySequence_Contains(PyObject *s, PyObject *v) 

s.index(v) i ntPySequence_Index(PyObject*s , PyObject *v) 


tuple(s) 

PyObject *PySequence_Tuple(PyObject *s) 

1 i s t (s) 

PyObject *PySequence_List(PyObject *s ) 


Two sequence functions perform less error checking to increase performance. 
PySequence_Fast (PyObject *o, const char *m) returns a new reference to o 
after converting it to a tuple, leavlng it unchanged (except for the reference count) 
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if it is already a tuple or a list. If the object o can’t be converted to a sequence, the 
function returns NULL and raises aTypeError with m as the message text. You can 
then pass the returned object to PySequence_Fast_GET_ITEM( PyObject *o , i nt 
index) to get borrowed references to sequence members. 

Caution PySequence_Fast_GET_ITEM assumes that the index values you pass in are 
valid and doesn't check for errors. 

Strings 

A PyStri ngObject is a specific type of sequence used to hold Python strings. 
PyStri ng_Check( PyObject *o) returns 1 if the given object is a string object; it 
verifies that o’s type is PyStri ng_Type (equivalent to types . Stri ngType in 
Python). 

You create a new string object from a null-terminated C string by calling 
PyStri ng_FrotnStri ng(const char *s).It returns a PyObject pointer that is a new 
reference to a string object of that value. For strings that might have embedded null 
characters, use PyStri ng_FrotTiStri ngAndSizeCconst char *s, int len). 

PyStri ng_Format( PyObject *forrriat, PyObject *args) returns a new reference 
to a string object created using a format string and a tuple of arguments, equivalent 
to the Python format % args. 

After creating a new string object, you can call PyStri ng_Resi ze (PyObject **s , 
i nt newsi ze ) to change its size to newsi ze. To Python, strings are immutable, so it 
is safe to call this function only if no other part of the program knows about the 
string yet (for example, when you just created it). 

Goingthe other direction, PyStri ng_AsString( PyObject *s) returns a char 
pointer to the string data, converting the object to a string first if needed. 

PyString_AsStringAndSi ze(PyObject*s, char **buffer, i nt *1 en )sets 
buf fer to point at a string representation of the object, returns the string length in 
len (len can be NU LL as long as the string has no embedded null characters), and 
returns -1 on failure. Both functions return pointers to Internal buffers that you 
shouldnT modify or de-allocate. 

Tip PyStri ng_AsStri ngAndSi ze works on both string and Unicode string objects. 



PyStri ng_Si ze (PyObj ect *s ) returns the length of the string. If the object is not 
already a string, the function first calls PyStri ng_AsStri ngAndSi ze and then 
returns the size. 

In cases where you know the object really is a string object, you can call 

PyString_AS_STRING(PyObject *s) and PyString_GET_SIZE(PyObject *s ) for 

better performance. 
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PyString_Concat(PyObject**s, PyObject*new) concatenates new ontothe 
end of s. The function itself returns no value; s contains a new reference to the con- 
catenated string object, orNULLon failure. PyStri ng_Concat calls Py_DECREF on 
the s object you pass in; in effect, you are transferring your reference to it and it 
gives you a new reference back. PyStri ng_ConcatAndDel ( PyObject **s , 

PyObject *new ) is a utility function that calls PyStri ng_Concat and then calls 
Py_XDECREF on new so you don’t have to. 

PyStri ng_InternInPl a ce (PyObject **s ) is equivalent to the Python i ntern func¬ 
tion. When you call this function, you transfer ownership of the reference and receive 
back a new reference. The object that you receive will be either the original object or a 
previously interned string of the same value.PyString_InternFrotnString 
(const char*s) isa utility function that converts a C string to a Python string 
object, interns it, and returns to you a new reference of the resuit. 

PyString_Encode(Py_UNICODE *s, intsize, char*encoding, char*errors) 
returns a new reference to an encoded string object. The en cod i n g and e r ro rs 
arguments are the same as those for the encode Python function (for example, 
errors can have values stri ct, i gnore, and repi ace). 

PyString_AsEncodedStri ng (PyObj ect *unicode, const char *encoding, 
const char *errors ) works the same way, but takes a PyUni codeObject. 

PyStri ng_Dec ode (char*s, intsize, char*encoding, char*errors) returns 
a new reference to a decoded string object, like the Python uni code function. 

See "Unicode strings" later in this chapter for more functions to handie Unicode 
objects. 


Lists 

A Py Li stObject holds a Python list; it has the type PyLi st_Type (equivalent to 
Python’s types . Li stType). PyLi st_Check( PyObj ect *p ) returns 1 if the given 
object is a list. 

Table 30-4 lists Python list operations and their equivalent C function calls. Unless 
returning an actual numeric value, functions returning integers return 0 to denote 
success. 


Table 30-4 

C List Functions 

Python 

Equivalent C Function Call 


len(t) 

i nt PyLi st_Si ze( PyObject *t) 


t[i ] 

PyObject *Py Li st_GetItem( PyObject *t, 

i n t i) 

t[i ] = 0 

int PyLi st_SetItem(PyObject *t, int i , 

PyObject *o) 
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Python 

Equivalent C Function Call 


t. insert(i, o) 

int PyList_Insert(PyObject *t, int i, PyObject 

*o) 

t. append(o) 

i nt PyLi st_Append ( PyObj ect *t, PyObject *o) 


t[a:b] 

PyObject *Py Li st_GetSl icetPyObject *t, int a. 

int b) 

t[a:b] = t2 

int PyLi st_SetSl i ce(PyObject *t, i nt a , i nt b , 
*t2) 

PyObject 

t.sort() 

int PyLi st_Srt ( PyObject *t) 


t. reverset) 

int PyLi s t_Re ve rse ( PyObject *t) 


tuple(t) 

PyObject *PyLi st_AsTupl e( PyObject *t) 



The list functions that take an index parameter assume that the index you supply is 
valid. Py Li st_GetItetn returns a borrowed reference to an item; and with 
PyLi st_SetItetn, you give up (transfer) ownership of the reference, but 
PyLi st_Insert behaves “normally” (it increments the reference count of the 
object passed in). Don’t forget that setting a list item replaces another item, causing 
its reference count to be decremented (which could in turn call its destructor). 

PyLi st_GetSl i ce returns a new reference to a list object containing the requested 
objects; those objects are also new references to the originals. PyLi st_SetSl i ce 
requires that both arguments (t and t2) be list objects. PyLi st_AsTupl e returns a 
new reference to a tuple object, and each member of the tuple is a new reference as 
well. 

PyLi st_New( int / en ) returns a new reference to a list object that has an initial 
length of 1 en . 

PyLi st_GET_SI ZE (PyObject *t ) is a slightly faster way to retrieve a lisfs size; it 
doesnT verify that the object t is really a list. The same is true for 

PyList_GET_ITEM(PyObj ect *t, i nt i ) and PyList_SET_ITEM(PyObject *t, 
i nt i , PyObject *o). 

Tuples 

A PyTupl eObj ect is the C version of a Python tuple; it has the type PyTupl e_Type, 
which is the same as Python’s types.TupleType. Call PyTupl e_Check( PyObject 
*o) to determine whether an object is a tuple. 

PyTupl e_N ew (i nt 1 e n ) returns a new reference to a tuple object of length 1 e n . 

PyTuple_Resize( PyObject *o, intnewsize, 0) resizes agiven tuple; as with 
the list resize function, it is safe to call only if no other references to this object 
exist. This function returns 0 on success. 

Table 30-5 lists the C function calls for common tuple operations. 
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Table 30-5 

C Tuple Functions 

Python 

Equivalent C Function Call 

1 en(t) 

i nt PyTuple_Size(PyObject *t) 

t[i] 

PyObject *PyTupl e_GetItem( PyObject *t, i nt i) 

t[i ] = 0 

int PyTuple_SetItem(PyObject *t, int i , PyObject *o) 

t[a:b] 

PyObject *PyTupl e_GetSl i ce( PyObject *t, i nt a , i nt b) 


PyT upl e_GET_ITEM(PyObject *t, i nt i ) and PyT upl e_SET_ITEM(PyObject *t, 

i nt i , PyOb ject *o ) are faster versions of PyTupl e_GetI tem and 

PyT upl e_Set Item; they assume you’re honest and pass in tuple objects. 

The same rules apply here as for lists: index values are assumed to be valid, 

PyT upl e_Get Item returns a borrowed reference, and PyTupl e_GetSl i ce incre- 
ments the reference count for each object in the slice. In addition, 

PyT upl e_Set Item transfers ownership of the reference to the tuple, and the refer¬ 
ence count of the item being replaced is decremented by 1. 

Buffers 

Python objects in C can implement a buffer interface, which is a group of functions 
that let an object expose the memory where it Stores its data. Buffer interfaces are 
often low-level or performance-conscious functions that want to access data in its 
raw byte format without having to copy the data. 

The C PyBufferObject structure is used to represent a Python buffer. These 

objects have atype of PyBuffer_Type. As usual, you can call 

PyBuf fer_Check( PyOb ject *o ) to determine whether an object is a buffer. 

Given an object that has an internal buffer, PyBuffer_FromObject(PyObject*o, 
i nt off set, i nt si ze ) creates a read-only buffer object to access the data start- 
ing at the given offset. If the object allows reading and writing of its buffer data, 
using PyBuffer_FromReadWriteObject(PyOBject *o, int offset, int size) 
creates a buffer object that supports writing too. Both functions return a new refer¬ 
ence to a buffer object, and for size you can use the constant Py_END_OF_BUFFER 
to include all data from the given offset to the end of the object. 

You can wrap a block of memory into a read-only buffer object by calling 
PyBuffer_FromMemory(void *p , i nt size).Py_Buffer_FromReadWri teMemory 
(voi d *p, i nt si ze ) does the same thing but allows writing to the buffer as well. 
Of course, for both of these functions you need to ensure the block of memory is 
valid for as long as the buffer object exists. An alternative is to let the buffer own 
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and managethe blockof memory itself; call PyBuffer_New( i nt size) to create a 
memory buffer of the given size. 

PyObject_AsReadBuffer(PyObject*o, constvoid **buffer, i nt *size) 

returns a pointer and a size value for a given objecfs internal buffer. For objects 
that support it, PyObject_AsWri teBuffer(PyObject*o, void**buffer, int 
*si ze ) returns the same information for a writeable buffer. 

Unicode strings 

Unicode strings have the type PyUni code_Type and are stored in a 
PyUni codeObject structure. Call PyUni code_Check( PyObject *o) to determine 
whether an object is a Unicode string. The actual characters of the string are stored 
in a member of this structure having the type Py_UN ICODE, which is a C typedef for 
16-bit values. 

Note On platforms such as Windows, which provide a usable wide character type 

(wchar_t), Python uses this type and its supporting functions for better perfor- 
mance and compatibility. 

PyUni code_GET_SIZE( PyObject *o) returns the number of characters in the 
string, and PyUni code_GET_DATA_SIZE( PyObject *o) returns the number of 
bytes used to store the string (string length * size of each character). 

PyUni code_GetSi ze (PyObject *o ) returns the string’s length after verifying that 
the object is a Unicode string. 

Converting to and from Unicode 

PyUni code_AS_UN ICODEC PyObject *o) returns a read-only pointer to the struc- 
ture’s internal Py_UNICODE member, and PyUni code_AS_DATA(PyObject *o) does 
the same but casts the return pointer to char *. PyUni code_AsUni code( PyObject 
*o) returns a pointer to the internal data but first ensures that the object really is a 
Unicode string. PyUni code_AsWi deCharf PyUni codeObject *o, wchar_t *buff, 
int length) copies the Unicode string into the given buffer, copying at most 
length characters, and returns the number of characters copied. 

PyUni code_FroniUni codefconst Py_UN ICODE *buff, int length) returns a new 
reference to a Unicode string object of the given length whose contents were 
copied from buff if it was not NULL. PyUni code_FroniWideCharf const wchar_t 
*buff, int length) does the same but copies from a wide character buffer 
pointer that must not be NULL. 

PyUni code_FrorriEncodedOb ject (PyObj ect*obj, constchar*encoding, 
const char *errors ) uses encodi ng and errors to coerce an encoded object to a 
Unicode object if needed, and returns a new reference to the Unicode object. Set 
encodi ng and errors to NULL to use the defaults. 
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PyUni code_FrotnObjectCPyObject *obj) is a utility function that calls 
PyUni code_FrorriEncodedObject with encodi ng set to NULL, and errors set to 
"stri ct". 

PyUni code_Decode( const char*s, intlength, constchar*encoding, const 
char *errors) takes astring 1 ength bytes longthat uses the given encodingand 
converts it to Unicode, returning to you a new reference to the Unicode object. 
PyUnicode_Encode (const Py_UNICODE *s, intlength, constchar* 
encodi ng, const char *errors ) encodes a Py_UNICODE buffer and returns a 
Python string object. 

Note The API also provides shortcut routines for encoding and decoding strings using 
Standard encodings such as 7-bit ASCII, UTF8, UTF16, Latin-1, and so on. See uni 
codeobject. h for detaiis. 


Checking and converting individual characters 

Py_UN ICODE_ISSPACE( Py_UNICODE ch) returns 1 if the given Unicode character is 
whitespace. Additionally, you can use the following to perform other similar checks: 


Py_UNICODE_ISLOWER 

Py_UNICODE_ISTITLE 

Py_UNICODE_ISDECIMAL 

Py_UNICODE_ISNUMERIC 

Py_UNICODE_ISALNUM 


Py_UNICODE_ISUPPER 

Py_UNICODE_ISLINEBREAK 

Py_UNICODE_ISDIGIT 

Py_UNICODE_ISALPHA 


Py_UNICODE_TOLOWER(Py_UNICODE ch),Py_UNIC0DE_T0UPPER, and Py_UNI- 
C0DE_T0TITLE return the given character converted to lowercase, uppercase, and 
titlecase, respectively. 

Py_UNICODE_TODECIMAL(Py_UNICODE ch) and Py_UNICODE_TODIGIT return the 
given character converted to an integer decimal and an integer digit (usually the 
same thing). Py_UNIC0DE_T0NUMERIC (Py_UN ICODE ch) returns a doubl e holding 
the numeric value of the given character (for example, given the single-character 
Symbol for one-half, it would return the number 0.5). 


Using string manipulation functions 

The following PyUni code functions work like their PySequence and PyStri ng 
counterparts: 

PyObject *PyUnicode_Concat(PyObject *a, PyObject *b) 

PyObject *PyUnicode_Split(PyObject *s, PyObject *sep, int 

tnaxspl i t) 

PyObject *PyUnicode_Join(PyObject *sep, PyObject *sequence) 
int PyUnicode_Count(PyObject *str, PyObject *substr, int 
stant, int end) 
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int PyUni cocle_Contai ns ( PyOb ject *container, PyObject *elenient) 
int PyUnicocle_Compare( PyObject *left, PyObject *right) 

PyObject* PyUnicode_Format(PyObject *fortnat, PyObject *args) 

PyUnicode_Replace(PyObject *str, PyObject*substr, PyObject*replstr, 
int maxcount) works like the normal string replace function; maxcount is the max¬ 
imum number of replacements to perform. 

PyUnicode_Find(PyObject*str, PyObject*substr, intstart, intend, int 
di recti on ) returns the index of the first match of substr in str[start:end], 
searching left-to-right if di recti on is 1, and right-to-left if di recti on is -1. 

PyUni code_Spl i tl i nes (PyObj ect *s , i nt tnaxspl i t ) returns a list of strings 
split at line breaks (the line break characters are removed), stopping after all text 
has been processed or maxspl i t splits have occurred. 

PyUnicode_TaiImatch (PyObject*str, PyObject*substr, intstart, int 
end , i nt di recti on ) checks whether substr matches a portion of str. If di rec¬ 
ti on is -1, the function returns 1 if strfstart: end] starts with substr. If di rec¬ 
ti o n is greater than or equal to 0, the function returns 1 ifstr[start:end] ends 
with substr. 

PyUnicode_Translate (PyObject*str, PyObject*table, constchar 
*errors ) maps characters to new values using a lookup table. The tabi e object 

can be a dictionary or sequence (or anything that has a_ geti tem _method). For 

each character in str, the function looks up its entry in tabi e and inserts the new 
value in the resuit (a Unicode object returned to you as a new reference). If the 
character’s entry in the table has a value of None, the character is deleted in the 
resuit (not copied); and if there is no entry in the table (the lookup causes a 
LookupError), the character is copied as is. 


Working with Mapping Objects 

Although Python currently has only one mapping object type, the Python/C API stili 
makes a distinction between the abstract and concrete object layers. 

Functions for any mapping type 

PyMappi ng_Check( PyObj ect *o ) returns 1 if the object is a mapping object. Table 
30-6 lists Python code for common mapping object operations and the correspond- 
ing C function calls. 
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Table 30-6 

C Mapping Functions 

Python 

Equivalent C Function Call 

1 en( 0 ) 

PyMapping_Length(PyObject *o) 

o[key] 

PyMapping_GetItemString(PyObject *o, char*key) 

o[key]=val 

PyMapping_SetItemString(PyObject *o, char *key, 

PyObject *val ) 

dei o[key] 

PyMapping_DelItem(PyObject *o, PyObject *key) 

PyMapping_DelItemString(PyObject *o, char*key) 


o.has_key(k) PyMapping_HasKey(PyObject *o , PyObject*k) 


0 .keys() 

PyMapping_HasKeyString(PyObject*o, char *key) 

PyMapping_Keys(PyObject *o) 

0 .values() 

PyMapping_Values(PyObject *o) 

0 .items() 

PyMapping_Items(PyObject *o) 


Dictionaries 

Dictionaries are represented by PyDi ctObject structures, and they have the type 
PyDi ct_Type (types . Di cti onaryType in Python). PyDi ct_Check( PyObject *o) 
returns 1 if the given object is a dictionary. 

Table 30-7 lists dictionary operations in Python and C. 



Table 30-7 

C Dictionary Functions 

Python 

Equivalent C Function Call 

d = {} 

PyDict_New() 

d.clearC) 

PyDict_Clear(PyObject *d) 

1 en(d) 

PyDict_Size(PyObject *d) 

d[key] 

PyDi ct_GetIteni( PyObject *d , PyObject *key) 

PyDi ct_Get I temStri ng( PyObj ect *d, char*key)'' 


d[key]=val PyDi ct_SetIteni( PyObject *d , PyObject *key, PyObj ect *val ) 

PyDict_SetItemString(PyObject*d , char*key, PyObject *val ) 
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Python 

Equivalent C Function Call 

dei d[key] 

PyDict_DelItem(PyObject *d , PyObject *key) 

PyDi ct_Del ItemStri ng( PyObject*d, char*key) 

d.keys() 

PyDict_Keys(PyObject *d) 

d.values() 

PyDict_Va1ues(PyObject *d) 

d.items() 

PyDict_Items(PyObject *d) 

d.copy() 

PyDict_Copy(PyObject *d) 


* Returns a borrowed reference to the value object 


Using Other Object Types 

The following sections describe a few other miscellaneous object types available in 
the Python/C API. 

Type 

PyTypeObject structures describe Python’s built-in types. These objects have the 
type PyTypeObject, and PyType_Check( PyObject *o ) returns 1 if the given 
object is a type object. 

None 

Py_None is the C equivalent of Python’s None. Use this anyplace to denote a lack of 
value instead of using NULL, because the Python/C API uses NULL to indicate an 
error. 

'Note Py_None is an actual object, so treat it like any other with respect to reference 

counting. For example, when a C extension module function has no return value, 
it shouid use the following idiom: 

Py_INCREF(Py_None); 
return Py_None; 


File 

Python file objects are thln wrappers around FILE objects in the Standard C 
libraries. The Python/C API uses a Py Fi 1 eObject structure to represent a file 
object; these structures have the type PyFi 1 e_Type, and you can call 
Py Fi 1 e_Check( PyObj ect *o ) to verify that an object is a file. 





572 Part V > Advanced Python Programming 


Py Fi 1 e_FrotnFi le(FILE*f, char *natne, char *tnode , int (close*)(FILE*)) 

creates a Python file object from a C file of the given name and mode. The file 
pointer f must be an already open file or NULL (although you should fili in a valid 
FILE structure before lettingany other code use it). The cl ose argument is the 
function to call to close the file; you can pass in the Standard C f cl ose function if 
you don’t need anything special. 

PyFile_FrotnString(char *fnatne, char *mocle) uses mode to open (or create, 
depending on the mode) a file named fname. Like Py Fi 1 e_FromFi 1 e, it returns a 
new reference to a Python file object. 

You can access the FILE pointer of a Python file object using 

Py Fi 1 e_As Fi le(Py Object *f), and Py Fi 1 e_Name (Py Object *f) returns a bor- 

rowed reference to a string object containing the file’s name. 

To simulate f.readline(n), call Py Fi 1 e_Get Li ne (PyObject*f, intn).lf the 
end of file has been reached, the function stili returns a string object (but of length 
0). If n is 0, the function reads one line, and If n is greater than 0, the function will 
read up to n bytes. If n is less than 0, the function reads one line of data but raises 
EOFError if the end of file has been reached already. 

You can set or ciear the softspace flag of a file or filelike object by calling 

Py Fi 1 e_SoftSpace( PyObject*f, int flag). A value of 1 means that a space will 

be output before the next data is written to the file. 

Py Fi 1 e_Wri teString(char*s, PyObject*f) writes a string to an open file. 

Py Fi 1 e_Wri te0bject(Py0bject*o, PyObject*f, intflags) writes a string 
representation of the given object o to the file f . By default, it gets the output by 
calling repr; use a fl ags value of Py_PRINT_RAW to have it call str instead. 

Module 

The Python/C API has functions for working with module objects and importlng 
them, as described in the following sections. 


Module objects 

PyModul eObj ect structures have the type PyModul e_Type, and 
PyModul e_Check( Py Object *o) returns 1 if the object o is a module. 

PyModul e_New( char *name ) returns a new reference to a new module object and 

creates the module’s namespace dlctionary. The module’s_ name _member is set 

to name, and its_ doc _member is set to an empty string. Before letting other parts 

of the program use the new module, you should at least set its_ f i 1 e _member. 

PyModul e_GetDi ct( PyObject *m) returns a borrowed reference to the module’s 
dictionary (_ di ct_ y PyModul e_GetName( PyObject *m) returns a char pointer 
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to the value of the module’s_ natne _memher, and 

PyModul e_GetFi 1 enarrie( PyObject *tn) returns a char pointer to the value of its 
_f i 1 e _memher. 



The following functions were introduced in Python 2.0. 


PyModul e_AddObject( PyObject *tn, char *natne , PyObject *val ue ) adds the 
object val ue to the module tn. This function steals a reference to val ue. 


PyModul e_AddI ntConstant (PyObject*tn, char*natne, intvaluelisa utility 
function that creates an integer ohject with the given value and adds it to the mod¬ 
ule. PyModul e_AddStri ngConstant(PyObject*tn, char*natne, char*value) 
does the same for a string varlahle. 


Importing modules 

PyItnport_IniportModule(char*nanie) loads the requested module and returns a 
new reference to it. Internally, Py Iniport_ItnportModul e calls 
Py Itnport_ItnportModul eEx (char *natne, PyObj ect*globals, PyObj ect 
*locals, PyObject *frotnl i st), which loads a module with the given glohal and 

local dictionaries, which may he NULL. Python’s_ import _function calls 

Py ImporLImportModul eEx. 

Py Itnport_Irriport (PyObj ect *name ) also loads a module, but it uses the current 
import hooks to do the loading. 

Chapter 35 shows you how to override importing behavior using import hooks. 


Py Itnport_Rel oadModule (PyObject *tn) reloads the given module Qust like the 
Python rei oad () function) and returns a new reference to it. 

Py Itnport_AddModul e( char *nanie ) returns a borrowed reference to a module 
called name, creating an empty module object if necessary. 

Py Itnport_GetModul eDi ct( ) returns a borrowed reference to the module dictio- 
nary (stored in sys .tnodul es). 

Py Itnport_ExecCodeModul e (char*name, PyObject *co) returns a new reference 
to a module object. The module is created and imported using co, which is a code 
object (obtained from a call to cotnpi 1 e or read in from a .pyc file). If the module 
already exists, it is reloaded using the given code object. 

Py Itnport_GetMagi cNutnber () returns a C 1 ong containing the little-endian, 4-byte 
magic number at the start of all .pyc and .pyo files. 

Before a call to Py_I n i ti al i ze, you can add your module to the list of built-in mod¬ 
ules bycalling PyItnport_AppendInittab(char*narrie, void (*initfunc) 

(voi d )), passing in the module name and initiallzation function to call. To add 
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several modules, call Py Itnport_ExtendIni ttab( struet _ini ttab *newtab), 
where newtab is an array of entries for each module, with an extra entry on the end 
with a N U L L name to denote the end of the list. The _i n i 11 a b structure has the fol- 
lowing format: 

struet _inittab { 
ehar *natne; 

void (*initfune)(void); 

1 ; 


PyItnport_IrriportFrozenModule(ehar *natne) loads afrozen module (created 
with the Freeze utility). This function only loads the module; you stili need to call 

Py Itnport_IrriportModul e to import it. 

The Freeze utility is covered in Chapter 36. 



CObjects 

Occasionally, it’s necessary to pass a C ohject (well, a pointer) from a function 
through Python code and back into C again. The PyCOb j eet structure is the 
Python/C API equivalent of a voi d pointer for just this purpose. Your code can call 
PyCOb ject_Check( PyObj eet *o ) to determine whether an ohject is of this type. 

To create a PyCObjeet, call PyCOb ject_FroniVoi dPtr (void*cobj, void 
(*destr)(void *)), which returns a new reference to the ohject. destr is a func¬ 
tion that will be called when Python is about to destroy the ohject. If you don’t 
need to do any cleanup, this argument can be N U L L. 

PyCOb jeets can also contain some extra Information called a descriptiori. Call 
PyCOb ject_FrorriVoi dPtrAndDesc (voi d* cobj , voi d* desc, void 
(*destr) (void *, voi d *)) to create an ohject with a description. Note that the 
destructor function receives both the ohject and its description when called. 

PyCOb ject_GetDesc (PyObj ect *o ) returns a pointer to the description data, and 
PyCOb j cet_AsVoi dPtr (PyObj ect *o ) returns the original C pointer used to create 
the PyCObjeet. 


Creating Threads and Sub-Interpreters 

One application can have multiple interpreters runnlng, and each interpreter can 
have multiple threads, but they ali share the Global Interpreter Lock (GIL). In order 
to operate on a Python ohject, a thread must have control of the GIL or it risks cor- 
rupting memory. 
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The interpreter releases and reacquires the lock often to ensure that each thread 
gets a chance to run; you can set how many bytecode instructions it processes 
bef ore releasing the lock bycallingsys.setcheckinterval(n); the current default 
is 10 instructions. 

Before potentially blocking I/O routines or long computations that don’t require 
working with Python objects, your code should manually release the lock and then 
reacquire it when the work is complete. 

Threads 

Each thread has some state Information stored inaPyThreadState structure, and 
a global variable holds a pointer to the current thread’s state. To release and reac¬ 
quire the GIL, use the Standard Python macros: 

Py_BEGIN_ALLOW_THREADS 
... // Some work 
Py_END_ALLOW_THREADS 

Among other things, these macros call PyEval_Rel easeLock and 
PyEval_Acqui reLock on the global lock. 

Caution When working with the global interpreter lock, pay close attention to when you 
release it and acquire it. Trying to acquire it once you aiready have it (through a 
recursive call, for example) is an excellent way to cause deadiock and bring your 
program to a screeching halt. 

PyEval_InitThreads( ) initializes the thread subsystem and acquires the GIL (cre- 
ating it if necessary). lt’s safe to call this before Py_Ini ti al i ze, although this is 
normally called automatically so that you don’t need to. 

Before a new thread created in C can access Python objects, it has to manually cre¬ 
ate Its own thread state, acquire the GIL, and then set the current thread state to 
point to the new thread’s state. When finished, it needs to reset the old thread state 
and release the lock. The Python/C API has several pairs of functions for working 
with the GIL and the current thread state. 

PyThreadState_New( Py InterpreterState *i nterp ) creates a new thread state 
structure. The i nterp argument is the current interpreter’s state, which is accessi- 
ble as the i nterp variable of any thread state structure. Call 
PyThreadState_Get () to get a pointer to the current thread state. Although you 
must have the GIL to get the current state, you do not need to have it to create a 
new thread state structure. 

Call PyThreadState_Cl ear( PyThreadState *state) to ciear athread’s state 
before calling PyThreadState_Del ete(PythreadState *state) to free the thread 
state memory. You must have the GIL to ciear a thread state structure, but not to 
delete it. 
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PyEval_Acqui reLock( ) and PyEval Rei easeLockl ) acquire and release the 
global interpreter lock, respectively. 

PyEval_Acqui reThreacK PyThreadState *state) acquires the GIL and sets the 
current state to state. Py Eval_Rel easeThreacK PyThreadState *state ) sets the 
current thread state to N U L L and releases the GIL. You have to pass in your thread 
state as a safety check to ensure that the correct thread is releasing the lock. 

PyThreadState_Swap( PyThreadState *state ) swaps the current thread state 
with state, which can be NULL (leaving the thread state selection up to the inter¬ 
preter). You must have the GIL to call this function. 

Sub-interpreters 

The global state for the interpreter is stored in a Py I nterpreterState structure. 

Py I nterpreterState_New( ) creates a new state structure, 

PyInterpreterState_Clear(PyInterpreterState *state) clears it before you 
release it, and Py I nterpreterState_Del ete (PylnterpreterState *state) 

frees its associated memory. You don’t need to have the GIL to create or destroy an 
interpreter state structure, but you do need to hold it to ciear one. 

Py_NewInterpreter() creates a new sub-interpreter that is almost completely 
independent of other interpreters (it stili shares the global interpreter lock, how- 
ever). The function returns a PyThreadState pointer that represents the now-cur- 
rent thread state in the new interpreter. Do not call this function until after youVe 
called Py_I n i t i a 1 i ze and you have the GIL. Although the thread state has been 
created, you stili need to create a new thread. 

Py_EndI nterpreter (PyThreadState *state) destroys the sub-interpreter to 
which the given thread state belongs. All thread States for that interpreter are also 
destroyed; and on return, the current thread state is NULL. You must hold the global 
lock to call this function. 

Tip Py_Fi nal i ze automatically destroys all sub-interpreters. 



Handiing Errors and Exceptions 

The general error convention used in the Python/C API is that when a function fails, 
it returns an error value (usually NULL) and sets an error flag to indicate that an 
exception has been raised. Other functions shouldnT also raise an exception, only 
the “source” of the problem. 
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Checking for errors 

In addition to checking return codes, you can call PyErr_Occurred (). If an excep- 
tion has been raised, this function returns a borrowed reference to the exception 
object, and N U L L otherwise. Py E r r_P r i n t () prints the stack traceback for the 
exception that was raised and then clears the error flag. You can also call 
PyErr_Cl ear () to ciear the error flag; use this if you don’t want a raised exception 
to make it back to the rest of the program. 

After an error occurs, you can see if it matches a specific type by calling 
PyErr_Excepti onMatchesfPyObject *e), which returns 1 to indicate a match. 
The value e is a pointer to an exception object. Exceptions in C are named the same 
as in Python, but with a “Py Exc_" prefix. For example, if a function returned NU LL 
to indicate an error and you want to see if it was an ItnportError, you’d use some- 
thing like the following: 

if (PyErr_ExceptionMatchesf PyExc_MemoryError)) 


PyErr_GivenExcepti onMatches(PyObject *given , PyObject *e ) returns1 if 
the two exceptions match. 

Tip Both of these functions let you perform multiple checks with a singie call. The 

^ object e can be a Python tuple containing a sequence of exceptions (or other 
tuples too) to compare against. 

Signaling error conditions 

PyErr_SetStri ng (PyExc *e, char *i nfo ) signals that the exception e has 
occurred (where e is one of Python’s exception objects as explained above). i nfo 
is an extra message to be displayed with the exception name. 

PyErr_Forrriat( PyObject *e, const char *format , . . . ) sets the error indicator 
and displays a formatted message using printf-style formatting. The recognized for¬ 
mat codes are c (character), d (decimal), x (hex), and s (string). 

Instead of a string, you can set the extra information to be any Python object with 

Py E rr_Set Object (PyObject *e, PyObject *value).lf you don’t want to provide 
any extra information with your error, just call PyErr_SetNone( PyObjcet *e). 

Many C library calls fail and set the per-thread error variable errno. Use 
PyErr_Set FrotnErrno( ) to raise an exception, and use the value in errno to come 
up with an appropriate informational message. 

/Note You do not need to increment reference counts on any of the Python objects 
^ passed to the error functions listed above. 
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Tip 


If you need to temporarily save and restore the current error state, call 

PyErr_Fetch(PyObject **type, PyObject **value, PyObject **traceback) 
to save it, and call Py Err_Restore( PyOB ject *type, PyObject *value, 

PyObject *traceback) to restore it. 

Several functions ralse exceptions for common problems. For example, if a direct 
call to one of Python’s memory manager routines falis, you should call 
PyErr_NoMetnory (). If one of your functions is called with a wrong argument type, 
call PyErr_BadArgutnent() to ralse aTypeError. 

Sometimes an error occurs but an exception cannot be raised (inside an object 
destructor, for example). In this case, PyErr_Wri teUnraisablefPyObject *obj) 
can be called to write a warning to stderr. It also prints the repr representation 
of ob j. 

When an error occurs, don't forget to release owned references before your func- 
tion exits. In addition, when raising exceptions, use the exception type that best 
matches the type of error that occurred. 

Creating custom exceptions 

It’s pretty easy to create a new exception type In C. For example, suppose you are 
writing a caching extension module called cache and need to create an exception 
that will be known in Python as cache.error. Use the following steps to create an 
exception type: 

1. Declare a static PyObject pointer for the error: 
static PyObject *Cache_Error; 

2. In the module’s initialization function, create the error object: 

Cache_Error = PyErr_NewException("cache.error", NULL, NULL); 

3. Using the module’s dictionary object, add the exception to Its namespace: 

PyDict_SetItemString(d, "error", Cache_Error); 

Raising warnings 

The PyErr_Warn ( PyObject *category, char *niessage) function sends the warn¬ 
ing pointed to by tnessage to the user, which Python by default displays on Stan¬ 
dard error. The category parameter can be any of the following global warning 
variables: 

PyExc_Warni ng 
PyExc_Deprecati onWarni ng 
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PyExc_Runti tneWarni ng 
PyExc_SyntaxWarni ng 
PyExc_UserWarni ng 

Under normal circumstances PyErr_Warn returns 0, but if the user configures 
Python to escalate warnings to errors, then the function returns -1 to indicate that 
it raised an exception. If it does raise an exception, be sure to treat it like any other 
exception by releasing owned references and returning an error code from the cur¬ 
rent function. 

New « Warnings are new in Python 2.1. 

Feature 


PyErr_WarnExpl i ci t( PyObject *category , char *tTiessage , char *f i 1 ename , 
i nt 1 i neno, char *tTiodul e , PyObject *regi stry ) lets you raise a warning and 
have complete control over all warning attributes. This function calls the 
warn_expl icit function in the Python warnings module. 

j-Cross- ^ Chapter 5 covers the warning module through which you can control how 
Referenc^ Python handies warning messages. 


Managing Memory 

Python has its own private memory pool, or heap, in which it Stores all Python 
objects and their data. Because it has its own memory allocation and de-allocation 
routines, you shouldn’t use mal 1 oc, free, new, and delete on Python objects. In 
fact, although it’s safe for you to use the normal C memory allocators for your own 
private memory usage, it doesn’t hurt to always use the Python memory manager. 

PyMetn_MALLOC( si ze_t n ) returns a void pointer to a block of memory, and 
PyMetTi_FREE( voi d *p ) frees a pointer p if it is not NULL. 

PyMetTi_NEW(TYPE, si ze_t n ) allocates enough memory to store n items of type 
TYPE, where TYPE is any C data type (that is, it allocates si zeof (TYPE) * n bytes of 
memory). It returns a pointer of the same type. PyMetTi_DEL( p ) frees the memory 
associated with p. 

PyObj ect_NEW(TYPE, PyTypeObject *t) creates a new Python object using the 
given C structure type and its corresponding Python type object: 

PyObject_NEW(dictobject, &PyDict_Type) // Create a dictionary 

PyObject_DEL( p) frees an objecfs memory. 
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Summary 

The Python/C API must be full-featured because it’s the same set of functions used 
to create the built-in modules and much of the interpreter itself. While not as easy 
to use as Python, the API makes working with Python objects in C at least tolerable. 
In this chapter, you learned about: 

Tracking the reference counts of Python objects. 

-f Using the abstract and concrete object layers to manipulate objects. 

-f Raising and handling Python exceptions in C. 

-f Managing memory using Python’s memory heap functions. 

In the next chapter, you learn to use NumPy, a set of numerical extensions for 
Python that let you do things such as efficiently handie large arrays of data. 

> > -f 


Number 

Crunching 


P ython can crunch numbers with the best of them. It 
offers built-in complex numbers, functions to handle 
advanced mathematics, random number generators, and more. 
This chapter covers Python’s number-crunching abilities. 


Using Math Routines 

The math module provides various higher-math functions. 
The functions raise aValueErrorif passed an input not in 
their domain. 

The math module also provides constants pi and e: 

def Circumference(Radius): 
return Radius*2*math.pi 



> ♦ ♦ ♦ 

In This Chapter 

Using math routines 

Computing with 
complex numbers 

Generoting random 
numbers 

Using orbitrory- 
precision integers 

> ♦ ♦ ♦ 


def 

ContinuousCompounding(Principal,InterestRate,Y 
ears): 

# Find the balance in a bank account, 
after some time 

# earning the specified interest rate (for 
example. .05), 

# compounded continuously. 

return Principal * math.pow(math.e, 
InterestRate*Years) 


Rounding and fractional parts 

The function ce i 1 ( x ) returns the smallest integer >=x. 
f 1 oor ( X ) returns the largest integer <=x. To round to the 
nearest integer, use the built-in function round. For instance: 

>>> math.ce i 1(2.2),math.floor(2.2),round(2.5) 
(3.0, 2.0, 3.0) 

>>> math.ce i 1(-2.5),math.floor(-3) 

(-2.0, -3.0) 
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The function tnodf (x) returns a tuple of the form (Fracjnt), where/nf is the integral 
part of X, and Frac is the fractional part: 

>>> math.modf(3) 

(0.0, 3.0) 

>>> math .tnodf (-2.22) 

(-0.2200000000000002, -2.0) 

General math routines 

The function sqrt (x) returns the square root of a non-negative number x. 

The function h y p o t (x, y) returns the hypotenuse of a triangle with sides of length 
X andy—that is, it returns math.sqrt(x*x + y*y). 

The function fmod ( x, y ) returns the remainder when x is divided by y. It uses the 
platform C library, which normally (but not always) returns the same answer as x%y. 

Logarithms and exponentiation 

The function exp (x) returns e to the power of x, while 1 og (x) returns the natural 
logarithm of x. The function 1 o g 10 ( x ) returns the base-10 logarithm of x. The func¬ 
tion p o w (x, y) returns x raised to the power of y. 

Note that 5**-l (an integer to a negative power) is illegal, but math . pow( 5 , -1) is 
legal (and equals 0.2, as you would expect). math . pow( -5,0.5) is stili illegal — for 
that, you need to use the cmath module. (See “Computing with Complex Numbers” 
later in this chapter.) 

The function 1 dexp(x ,y) (short for “load exponent”) returns x * (2**y). The func¬ 
tion f rexp (X) returns the mantissa and exponent of x — a tuple (a,b) such that x 
== a * (2**b). The exponent, b, is an integer. The mantissa, a, is such that 
0.5<=a<l, unless x is 0, in which case, frexp(x)== (0.0,0). 

Trigonometric functions 

The functions sin(x), cos(x), and t a n (x) return the sine, cosine, and tangent 
(respectively) of an angle x, measured in radians: 

>>> math.cos(math.pi) 

-1.0 

>>> DEGREES_T0_RADIANS = math.pi/180 

>>> math.tan(45*DEGREES_T0_RADIANS) #Convert degrees to radians 
0.99999999999999989 

The functions sinh(x), cosh(x), and t a n h (x) compute hyperbolic sine, cosine, 
and tangent, respectively. 
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The inverse trigonometric functions asin(x), acos(x), and atan ( x ) return the arc 
sine, arc cosine, and arc tangent of x, respectively. The values of a s i n ( x ) and 
a t a n ( X ) are chosen between -pi/2 and pi/2. The value of a c o s ( x ) is chosen 
between 0 and pi. 


Computing with Complex Numbers 

Recall that in Python, the imaginary part of a complex number is indicated by a j 
(not an i). The function complexfreal [,iniag]) creates a complex number. The 
attributes real and imag of a complex number return its real and imaginary part, 
respectively; and the conjugate method returns its complex conjugate, as shown 
in the following example: 

»> (1 - Ij) * (1 + Ij) 

(2+Oj) 

>>> coniplex(-5) + 3J # j or J, case doesn't matter 
(-5+3j) 

»> X = (2+3j) 

>>> x.real ,x.irriag,x.conjugate() 

(2.0, 3.0, (2-3j)) 

>>> abs(x) # magnitude of x = hypotCx.real,x.imag) 

3.6055512754639896 

The math functions operate only on real numbers; for instance, math.sqrt(-4) 
raises a Val ueError exception, because-4 has no real roots. math’s sister-module, 
cmath, provides functions for working with complex numbers. These cmath func¬ 
tions accept complex input, but are otherwise the same as the corresponding math 
functions; acos, asi n, atan, cos, exp, 1 og, 1 oglO, si n, si nh, tan, and tanh. 

In additlon, cmath provides the inverse hyperbolic trigonometric functions: 

asinh(x), acosh(x), and atanh(x). 


Cenerating Random Numbers 

The random module provides a pseudo-random number generator. 

Random numbers 

Several functions are available to produce random numbers; you can also instanti- 
ate your own random number generator. 



Prior to Version 2.1, the random module used the whrandom module —which 
provides much of the same functionality —however, the whrandom module is now 
deprecated. 
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Random integers 

The function randrange([start,]stop[,step]) provides a random number cho- 
sen from the corresponding range. randrange is now the preferred way to get a 
random integer, but you can also call randi nt(mi n ,max). 


Random floating-point numbers 

The function random provides a floating-point number x such that 0<=x<l. The func¬ 
tion uniform(a,b) provides a floating-point number x such that a<=x<b. 


Random selections 

The function choice(sequence) returns arandomly selected element of the speci- 
fied sequence. The function shuff 1 e (sequence ) shuffles a sequence in place. 
(Note that the sequence must be mutable — to shuffle a tuple or string, first convert 
it to a list.) 


Seeding the RNG 

The random number generator is not actually random, merely hard to predict. It is 
deterministic, and its output is determined by its seed values. By default, random 
seeds the generator with numbers derived from the current System time. But you 
can seed it yourself by calling seed (x), wbere x is a hashable object. This example 
seeds and re-seeds the generator: 

>>> random.seed( 123) 

>>> random.randomf) 

0.54140954469092906 

>>> random.seed( 123) # do it again! 

>>> random.random() 

0.54140954469092906 

The functions in random are actually methods of the class random. Random. The 
module automatically creates one instance of the class for you. If you like, you can 
instantiate one or more Random instances yourself, to produce independent 
streams of pseudo-random numbers. This is highly recommended for multi- 
threaded programs, as two threads using the same random number generator may 
receive the same numbers. 

Generator state 

The random number generator keeps an internal state, which changes each time it 
supplies a new random number. The function getstate returns a snapshot of its 
current state, which you can restore using setstate(state). You can also call 
j umpa head ( n ) to skip forward n steps in the stream of random numbers. 

New A: The methods getstate, setstate, and j umpa head are new in Version 2.1. 

Tsature 
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Example: shuffling a deck 

The example shown in Listing 31-1 prints out a deck of playing cards in random 
order. 


Listing 31-1: Cards.py 


import random 

# Represent a card as a tuple of the form (Value.Suit): 

CARD_VALUES=["A",2,3,4,5,6,7,8,9,10,"J","Q","K"] 
CARD_SUITS=["C1ubs","Hearts","Diamonds", "Spades"] 

Cards=[] 

for Suit in CARD_SUITS: 

for Value in CARD_VALUES: 

NewCard=tuple((Value,Suit)) 

Cards.append(NewCard) 

random.shuffle(Cards) 

for Card in Cards: 
print Card 


Random distributions 

Then random module provides functions to provide random numbers distributed 
according to various formulae, such as the normal distribution. The following 
statistics functions are available: 

4 betavariable(a,b) — The beta distribution. Probability density is x®''(l - x)'’"^ / 
B(a,b), where B(a, b) = r(a) r(b) / r(a+b). Both a and b must be greater 
than -1. 

4 cunifvariate(mean,arc) — Circular uniform distribution. Both mean and arc 
must be an angle (in radians) from 0 to pi. 

4 expovariate(lambda) — The exponential distribution. Probability density is 

Tie-^. 

4 gammavariate(a,lambda) — The gamma distribution. Probability density is 
7.“ xf“ P e=‘/'’ / r(a). must be larger than -1, and b must be larger than 0. 

4 gauss(mu,sigma) — The Gaussian (normal) distribution with mean mu and 
Standard deviation sigma. This is slightly faster than normalvariate. 

4- lognormvariate(mu,sigma) — The log normal distribution. The natural loga- 
rithm of this distribution has mean mu and Standard deviation sigma. 
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-f normalvariate(mu,sigma) — The normal distributiori. Mean is mu, and the 
Standard deviation is sigma. 

-f paretovariate(a) — The Pareto distribution. Probability density is a / ' for 

X >=1 

♦ vonmisesvariate(mu,kappa) — The Von Mises distribution. Mean angle (in 
radians) is mu, and kappa is the concentration parameter. 

♦ weibullvariate(a,b) — The Weibull distribution. Probability density is 
aPxP-^espC-axP ') . a must be greater than 0; b must be at least 1. Same as the 
exponential distribution If b=l. 

Example: plotting distributions using Monte Cario 
sampling 

Listing 31-2 plots different random distribution with a text graph. It uses a trick 
called Monte Cario sampling: It samples the distribution many times, and graphs the 
sample results. These results approximate the actual random distribution. 


Listing 31-2: Plotter.py 


import random 

def MonteCarl oSampler(DistributionFunction,Min,Max, 

Step,Times=1000): 


Call the Distribution function the speci fied number 
of times. Divide the range [Min.Max] into intervals 
(buckets), each with width Step. Keep track of how 
many values fall into each bucket. 


Buckets=[] 

BucketLeft=Min 
while BucketLeft<Max: 

Buckets.append(O) 

BucketLeft+=Step 
for Sample in range(Times ): 

Value=DistributionFunction() 

Bucketindex = int((Value-Min)/Step) 

if (BucketIndex>0 and BucketIndex<len(Buckets )): 

Buckets[BucketIndex]+=l 
return Buckets 

def P1otValues(Buckets , Height): 


Plot a collection of values, scaling them to the speci fied 
height (in rows). 
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MaxValue = tnax( Buckets) 

SealedBuckets=[] 
for Value in Buckets: 

ScaledBuckets.append(Value*Height/MaxValue) 
for RowNumber in range(Height,0,-1): 
for Value in SealedBuckets: 
if Val ue> = RowNutnber: 

print 
el se: 


print " ", 

pri nt 


Nortnal Cal 1 er = lambda : random. normal vari ate( 100,5) 
Val ues=MonteCarloSampler(NormalCaller,80,120,1) 

P1otValues(Values,20) 

GammaCaller = lambda : random.gamma variate(0.5,5) 
Val ues=MonteCarloSamplerCGammaCaller,0,5,0.15) 

P1 OtValues(Values,20) 


Using Arbitrary-Precision Numbers 

The mpz module provides an interface to the integer functionality of the GNU 
Multiple Preclslon Arlthmetlc Llbrary (GMP). mpz is an optional module, and 
requires GMP to work. Visit GMP’s Homepage at http : //www. swox. com/gmp to 
learn about installing and building GMP. 

The mpz module enables you to do arithmetic using high-precision integers, or 
mpz-numbers. You can construet an mpz-number with the function mpz (Number), 
where Number is an integer, a long, another mpz-number, or an mpz-string. An mpz- 
string is a binary representation of an mpz-number; it consists of an array of radix- 
256 digits, with the least significant digit first. The method binary returns an 
mpz-string for an mpz-number: 

>>> SmallNumber = mpz.mpz(5) 

>>> SmallNumber # string representation has form mpz(#): 

mpz(5) 

>>> BigNumber = mpz.mpz(50000L) 

>>> BigNumber.binary() 

'P\303' 

>>> BigNumber % 256 should equal ord(P), or 80: 

80 

>>> type(BigNumber)==mpz.MPZType # MPZType is for type-checking 
1 
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An mpz-number has no other methods. It supports all the usual arithmetic opera- 
tors, as well as built-in functions such as abs, i nt, and so on. 

The tnpz module provides several extra functions for manipulating mpz-numbers. 
Each function takes mpz-number for its argument(s), converting ints and longs if 
necessary. 

The function gcd (X , Y ) returns the greatest common divisor of X and Y. The func¬ 
tion gcdext (X , Y ) provides a tuple of the form (GCD, S,T) such that X*S + Y*T == 
GCD, and GCD is the greatest common divisor of X and Y. 

The function sqrt (X) returns the square root of X, rounding the resuit (if neces¬ 
sary) toward zero. The function sq rt rem (X) returns a tuple (Root,Remainder) such 
that Root*Root + Remai nder == X; the tuple is chosen such that Remainder is as 
small as possible. 

The function powm( Base , Exponent, Modul us ) raises Base to the power Exponent, 
and then returns the resuit modulo Modulus. It is a shortcut for 

(Base**Exponent)%Modul us. 

The function di vm( Numerator, Denomi nator, Modul us ) computes the quotient of 
Numerator and Denominator modulo Modulus — a number Q such that 
(Q* De nomi nator )%Modulus == Numerator. Modulus and Denominator musi be rela- 
tively prime, as shown here: 

>>> mpz. di vm( 10,20, 99 ) # 10/20 is equal to 50, modulo 99. 
mpz(50) 


Summary 

Python can do complex arithmetic, trigonometric functions, and even some statis- 
tics. Moreover, it can do it all very precisely. In this chapter, you: 

-f Did complex arithmetic and some simple trigonometry. 

-f Shuffled a deck of cards, with the help of random. 

-f Did high-precision integer arithmetic. 

In the next chapter, you’ll learn all about Numeric Python — NumPy: powerful 
extension modules for fast computation matrix arithmetic and much more. 

> > -f 


Using NumPy 



T he NumPy extension modules introduce a new sequence 
type: the array. Arrays are fast — much faster than lists 
or tuples for “heavy lifting” such as image processing. Arrays 
also have many powerful methods and functions associated 
with them, so they are often handy, even when speed isn’t an 
issue. 


Introducing Numeric Python 

Numeric Python (also known as NumPy) is a collection of 
extension modules for number crunching. The core module, 
Numeri c, defines the array class and various helper functions. 
This chapter focuses on the Numeric module. NumPy’s other 
optional modules include the following: 

-f MA — Masked arrays. These are arrays that may have 
some missing or invalid elements. 

-f FFT — Fast Fourler transforms 

-f LinearAlgebra —Llnear algebra routines (calculation of 
determinants, elgenvalues, and so on) 

-f RandomArray, RNG — Interface to random number gen- 
erators. These may be useful if the random module 
doesnT have what you need. 

Installing NumPy 

Because NumPy is not part of the Standard Python distribu- 
tion, the first order of Business is to install it. The NumPy pro- 
ject is hosted at SourceForge (http : //sourceforge .net/ 
pro jects/numpy). Here, you can download the NumPy source 
code, or (for Windows) a binary distribution. 1 recommend 
downloading the source tarball, in any case, as it includes a 
nice tutorial (in DemoXNumTut) and some examples. 


> ♦ ♦ ♦ 

In This Chapter 

Introducing Numeric 
Python 

Accessing and slicing 
arrays 

Calling universal 
functions 

Creating arrays 

Using element types 

Reshaping and 
resizing arrays 

Using other array 
functions 

Array example: 
analyzing price 
trends 

♦ ♦ ♦ ♦ 
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Some quick definitions 

An array is a sequence — a collection of elements all of a partlcular type (usually 
numeric). A universal function, or ufunc, is a function that takes an array (or other 
sequence), acts on each element individually, and returns an array of results. The size 
of an array (the total number of elements) is fixed. However, its shape may vary 
freely; for example, a linear array of 12 elements may be reshaped into a 3 x 4 grid, a 
2x2x3 cube, and so on. These shapes can be represented in Python as tuples of the 
form (12,), (3,4), or (2,2,3). An array can have several dimensions, or axes. 

Meet the array 

You can construet an array by calling array(sequence). Here, sequence is a collec¬ 
tion of values for the array. For example: 

>>> import Numeric 

>>> Sample=Numeric.array([1,2,3,4,5]) 

>>> Sample # Print the array: 
array([1, 2, 3, 4, 5]) 

>>> # Remember not to do this: 

>>> BadSample=Numeric.array(1,2,3,4,5) # Too many arguments! 
Traceback (innermost last): 

File "<pyshel1#236>", line 1, in ? 

BadSample=Numeric.array(1,2,3,4,5) 

TypeError: function requires at most 4 arguments; 5 given 

A nested sequence results in a multi-dimensional array. However, note that the 
source sequence must form a valid shape: 

>>> Numeric.array([[1,2],[3,4], [5,6]]) # 3x2 array 
array([[1, 2], 

[3, 4], 

[5, 6]]) 

>>> Numeric.array([[1,2],[3,4,5]]) # Not rectangular! 

Traceback (innermost last): 

File "<pyshel1#14>", line 1, in ? 

Numeri c.array([[1,2],[3,4,5]]) # not rectangular! 

TypeError: an integer is required 


Accessing and Slicing Arrays 

You can access an array’s elements by index or by slice: 

>>> Fi bonace i =Numeric.array((1,1,2,3,5,8,13)) 

>>> Fibonacci[4] # An element 
5 

>>> Fibonacci[:-1] # A slice (giving a sub-array) 
array([1, 1, 2, 3, 5, 8]) 
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>>> Fibonacci[0]=44 # Arrays are mutable (but not resizable) 
>>> Fibonacci # (We broke the Fibonacci series) 
array([44, 1, 2, 3, 5, 8, 13]) 

>>> MagicSquare=Numeric.array([[6,l,8],[7,5,3],[2,9,4]]) 

>>> MagicSquareCO] # The first row 
array([6, 1, 8]) 

>>> MagicSquareCO] [2] # A single element 
8 


Arrays can be sliced along any axis, or along multiple axes at once. You provide the 
slicing information for each axis, one by one. For example, following are some slices 
on a 4 X 4 array: 

>>> # Produce an array of the numbers 0 to 15: 

>>> Sixteen=Numeric.arrayrangedG) 

>>> # Reshape the array into a 4x4 grid: 

>>> FourByFour=Numeric.reshape(Sixteen,(4,4)) 

>>> FourByFour 
[[ 0, 1, 2, 3,] 

[ 4, 5, 6, 7,] 

[ 8, 9,10,11,] 

[12,13,14,15,]] 

>>> FourByFour[1:3,1:3] # rows 1 and 2, columns 1 and 2 
[[ 5, 6,] 

[ 9,10,]] 

>>> FourByFour[:,0] # Every row, but only the first coiumn 
[ 0, 4, 8,12,] 

The array returned by a slice is not a copy of the old array, but a reference to the 
old array’s data. Note that this is different from the behavior of the slice operator 
on lists. Compare the results of the following two operations: 

>>> FirstList=[l,2,3,4,5] 

>>> SecondList=FirstList[1: 4] # Normal slice copi es data 
>>> SecondList[0]=25 # FirstList is unchanged! 

>>> FirstList 
[1, 2, 3, 4, 5] 

>>> FirstArray=Numeric.array(FirstList) 

>>> SecondArray=FirstArrayfl:4] # Array slice doesn't copy data 
>>> SecondArray[0]=25 # FirstArray is changed! 

>>> FirstArray 
[ 1,25, 3, 4, 5,] 

Note Some array manipulations make a copy of array data, while others provide a new 

reference to the same data. Make sure that you know which you are doing — 
otherwise, you may end up with two array variables that "step on each others' toes"! 

Optionally, you can provide a third “step” parameter for an array slice. This enables 
you to take every nth element within a slice, or to reverse the order of a slice: 

>>> Sixteen[1:10: 2] # Every other element from the slice 
11,3,5,7,9,] 
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>>> Sixteen[: :-1] # Reverse the order of the slice 
[15,14,13,12,11,10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0,] 


Contiguous arrays 

An ordinary array is contiguous — its entries all live next to one another in memory. 
Passing a slice-step is one way to get a noncontiguous array. The iscontiguous 
method of an array returns true if the array is contiguous. Most functions don’t care 
whether an array is contiguous or not, hut some (such as the flat attribute) do: 

>>> SorrieNurribers=Nurrieric.arange(10) 

>>> OddNurribers=SorrieNurribers[: : 2] 

>>> OddNumbers.iscontiguous() 

0 

>>> OddNumbers.flat 
Traceback (innermost last): 

File "<pyshel1#84>", line 1, in ? 

Fred.flat 

ValueError: flattened indexing only available for contiguous 
array 

The function ravel (array) returns a one-dimensional, contiguous copy of an array. 


Converting arrays to lists and strings 

You can extract array contents as a llst (by calling the array’s tol i st method) or as 
a string (by calling tostri ng). For example, in the following 4x4 array, the letters 
of each row and column form a word: 

>>> MyArray=Numeri c . array ([ "FIORN", "OBOE","ROSE","NEED"]) 

>>> MyArray 
[[H,0,R,N,] 

[0,B,0,E,] 

[R,0,S,E,] 

[N,E,E,D,]]] 

>>> MyArray[3] # The letters of row 3 form a word: 

[R,0,S,E,] 

>>> MyArray[:,2] # The letters of column 3 form the same word: 
[R,0,S,E,] 

I cannot compare one slice to another directly, because comparison operators are 
not defined for arrays. However, by converting slices to lists, 1 can verify that the 
column words are the same as the row words: 

>>> MyArray[2]==MyArray[:,2] # == is not available for arrays 
Traceback (innermost last): 

File "<pyshel1#315>", line 1, in ? 

MyArray[2]==MyArray[:,2] 

TypeError: Comparison of multiarray objects other than rank-0 
arrays is not implemented. 
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>>> MyArray[2].toli st()==MyArray[:,2].tolist() 
1 


Calling Universal Functions 

Universal functions, or ufuncs, are performed elementwise — they affect each ele- 
ment individually: 

>>> A=Nutneri c . array ([ [1,2], [3,4]]) # 2x2 array 
>>> Numeric.add ( A, 5) # Add 5 to each element 
array([[6, 7], 

[8, 9]]) 

>>> A+5 # Operators are overloaded to ufuncs 

array([[6, 7], 

[8, 9]]) 

Two arrays of compatible shape and size can be added, multiplied, and so on. 
These operations are also done element by element; therefore, multiplying two 
arrays does not perform the matrix multiplication of linear algebra. (For that, call 
the matri xmul ti pl y function, or use the Matrix module.) For instance: 

>>> B=Numeric.array([[5,6],[7,8]]) 

>>> A*B # Elementwise multiplication 
array([[ 5, 12], 

[21, 32]]) 

A ufunc can operate on any sequence, not just an array. However, its output is 
always an array. The Numeric module provides many ufuncs, whose names are 
fairly self-explanatory (see Table 32-1): 



Table 32-1 

Universal Functions 

Category 

ufuncs 

Arithmetic 

add, subtract, multiply, divide, remainder 

Powers and Logs 

power, exp, log 

Comparison 

equal, not_equal, greater, greater_equal, less, less_equal, minimum, 
maximum 

Logic 

logical and, logical or, logicaLxor, logical not 

Trigonometry 

sin, cos, tan, sinh, cosh, tanh, arcsin, arccos, arctan, arcsinh, arccosh, 
arctanh 

Bitwise 

bitwise_and, bitwise_or, bitwise_xor, bitwise_not 
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Ufunc destinations 

By default, a ufunc creates a brand-new array to store its results. An optional last 
argument to a ufunc Is the destination array. The output of a ufunc can be stored in 
any appropriately sized array with compatible typecode. ff the destination is the 
same as the source array, an operation can be performed in place, as it is here: 

>>> Nutnbers=N umeri c.array ((4,9,16), Numeri c. FI oat) 

>>> Numeric.sqrt(Numbers) # Elementwise square-root 
array([ 2., 3., 4.]) 

>>> Numbers # Original array is unchanged 

array([ 4., 9., 16.]) 

>>> Numeric.sqrt(Numbers,Numbers) # Take roots in place 
array([ 2., 3., 4.]) 

>>> Numbers # The original array WAS changed! 

array([ 2., 3., 4.]) 

Performing operations in place is more efficient than creating new arrays left and 
right. However, the destination must be compatible with the ufunc’s output, both in 
size and in typecode. For instance, the preceding square root example used a float 
array, because an in-place square root operation is not allowed on an int array: 

>>> Numbers=Numeri c .array( (4,9,16 )) # (NOT a float array) 

>>> Numeric.sqrt(Numbers,Numbers) 

Traceback (innermost last): 

File "<pyshel1#33>", line 1, in ? 

Numeric.sqrt(Numbers,Numbers) 

TypeError: return array has incorrect type 


Example: editing an audio stream 

Listing 32-1 provides an example of the power of the array class. We read in a 
stream of audio data as an array of numbers. The left and right channels of the 
stereo sound are mixed together — every other number represents sound on the 
left channel. We shrink the numbers corresponding to the left channel, and thereby 
make the left channel quieter without affecting the right channel. 

Cross- A See Chapter 24 for more information on audio operations in Python, including an 
Referen^ explanation of the wave module. 


Listing 32-1: Quiet.py 


import Numeric 
import wave 

BUFFER_SIZE=5000 

# NB: This is an 8-bit stereo .wav file. If it had a different 
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# sample size, such as 16-bits, we would need to convert the 

# sequence of bytes into an array of 16-bit integens by 

# calling Numeric.fromstringCData,Numeric.Intl6) 

In Fi 1 e=wave.open("LoudLeft.wav","rb") 

OutFi 1 e=wave.open("QuietLeft.wav","wb") 

OutFi 1 e.setparams(InFi1 e.getparams()) 
while 1: 

# Read audio data as a string of bytes: 

Data=InFile.readframes(BUFFER_SIZE) 
if 1 en(Data)==0: 
break 

# Create an array based on the string: 
Frames=Numeric.array(Data, 

typecode=Numeric.UnsignedIntS,savespace=l) 

# Take every other frame to get just the left side. And, 

# divide each one by 2. (We would 1 ike to use 

# Numeric.di vide(Frames[::2],2), but we can't, 
because the returned array would have float type). 

Frames[::2] = Frames[::2]/2 

OutFi 1 e.wri teframes(Frames.tostringC)) 

InFi1 e.close() 

OutFi 1 e.close () 


Repeating ufuncs 

Each binary ufunc provides a reduce method. The reduce method of a ufunc is 
similar to the built-in function reduce. It iterates over a sequence of array ele- 
ments. At each stage, it passes in (as arguments) the new value and the most recent 
output. For example, mul ti pl y. reduce multiplies a sequence of numbers: 

>>> Factors=Numeric.array((2,2,3,5)) 

>>> Numeric.multiply.reduce(Factors) 

60 

The reduce method takes a second, optional parameter — the axis to reduce over. 
(By default, reduce combines values along the first axis.) For instance, suppose I 
want to test whether a matrix is a magic square, wherein each row and column of 
numbers has the same sum. I can call add . reduce to calculate all these sums: 

>>> Square=Numeric.array([[1,15,14,4],[12,6,7,9], 

[8,10,11,5],[13,3,2,16]]) 

>>> Numeric.add.reduce(Square) # Sum over each column 
array([34, 34, 34, 34]) 

>>> Numeric.add.reduce(Square,1) # Sum over each row 
array([34, 34, 34, 34]) 

1 can verify that the sums are all the same by checking whether mi n i mum. reduce 
and maxi mum. reduce give the same value, as that can only happen if the sequence 
elements are all identical. With a few more lines of code, I have a function to find 
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magic squares, magic rectangles, magic cubes, or even magic hypercubes, as shown 
in Listing 32-2: 


Listing 32-2: MagicSquare.py 


import Numeric 

def IsMagic(Array): 

TargetSum=None 

for Axis in range(1 en(Array.shape)): 

AxisSums=Numeric.add.reduce(Array,Axi s) 

MinEntry=Numeric.minimum.reduce(AxisSums) 

MaxEntry=Numeric.maximum.reduce(AxisSums) 

# For 3 dimensions and up, MinEntry and MaxEntry 

# are stili arrays, so keep taking minima and maxima 

# unti1 they become ordinals: 

while type(MinEntry)==Numeric.ArrayType: 

MinEntry=Numeric.minimum.reduce(MinEntry) 

MaxEntry=Numeric.maximum.reduce(MaxEntry) 
if (MinEntry!=MaxEntry): 
return 0 

if (TargetSum==None): 

TargetSum=MinEntry 
elif TargetSum!=MinEntry: 
return 0 

return 1 

if _name_=="_main_ 

Square=Numeric.array([[1,15,14,4],[12,6,7,9], 

[8,10,11,5],[13,3,2,16]]) 

print IsMagic(Square) 

Cube=Numeric.array([[[10,26,6],[24,l,17],[8,15,19]], 
[[23,3,16],[7,14,21],[12,25,5]], 

[[9,13,20],[11,27,4],[22,2,18]]]) 

print IsMagic(Cube) 


In addition to reduce, eacb binary ufunc has an accumul ate metbod. A call to 
accumul ate retains all tbe intermediate results of the function. For example, I 
could determine where a running total became negative: 

>>> Numbers=Numeric.array((5,10,20,-4,-2,-10,-5,-3,-10,-2)) 

>>> Numeric.add.accumul ate(Numbers) 

array([ 5, 15, 35, 31, 29, 19, 14, 11, 1, -1]) 

Finally, eacb binary ufunc bas an outer metbod. Tbis metbod calls the ufunc many 
times — once for eacb pair of elements from the two arrays. If A is an n-dimensional 
array and B is an m-dimensional array, then outer(A,B) is an (n+m)-dimensional 
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array, where the element with coordinates (al,a2,...,an,bl,b2,...,bm) is the output of 
ufunc(A[al][a2]. . .[an],B[bl][b2]. . .[bm]). For example, here is the effect of 
outer multiplication: 

>>> Numeric.multiply.outer([l,2,3],[4,5,6]) 
array([[ 4, 5, 6], 

[ 8 , 10 , 12 ], 

[12, 15, 18]]) 

>>> Numeric.multiply.outer([[l,2,3],[4,5,6]],(l,2)) 
array([[[ 1, 2], 

[2, 4], 

[ 3, 6]], 

[[ 4, 8], 

[ 5, 10], 

[ 6 , 12 ]]]) 


Creating Arrays 

The array constructor has syntax array (sequence[,typecode[,copy=l[, 
savespace=0]]]). Here, sequence is (as you have seen) a source of data for the 
array. The element typecode is an element type (as described in the next section). If 
savespace is true, the array element’s type will not increase in precision: 

>>> Squares=Numeric.array((4,9,16)) 

>>> SpaceSaverSquares=Numeric.array((4,9,16),savespace=l) 

>>> Squares/f1oat(5) # elements are all upcast to float 
[ 0.8, 1.8, 3.2,] 

>>> SpaceSaverSquares/f1oat(5) # elements are NOT upcast! 

[0,1,3,] 

If CopyFlag is false and sequence is an array, the new array will be a reference into 
the old array. This saves space and processing time, but remember that altering 
either array will affect the other! This code creates two arrays that point to the 
same block of memory: 

>>> Array1 = Numeric.array((1,2,3,4,5)) 

>>> # Next line has same effect as Array2=Arrayl[:] 

>>> Array2=Numeric.array(Arrayl,copy=0) 

»> Array2[2]=0 
>>> Arrayl 
[1,2,0,4,5, ] 

Array creation functions 

The function arrayrange([start,]stop[,step]) returns an array consisting of a 
range of numbers; it is a shortcut for calling array ( range( ...)). The function 
zeros(shape[,typecode[,savespace=0]]) creates a zero-filled matrix with the 
specified shape. The function ones is similar: 
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>>> Numeric.zeros(5) 

[0,0,0,0,0,] 

>>> Numeric.ones((3,3)) 

[[1,1,1,] 

[1,1,1,] 

[1,1,1,]] 

You may encounter the word zeros if you create an empty array. For example, if I 
take an empty slice of an array, the resuit is a 0-dimensional array of zeroes: 

>>> bob=Numeric.array((1,2,3)) 

>>> bob[2:2] # Empty slice 
zeros((0, ), ' 1 ' ) 

The function i denti ty (n) returns the identity matrix with rank n as an array: 

>>> identity(3) 

[[1,0,0,] 

[0,1,0,] 

[0,0,1,]] 

You can combine several arrays into one big array with a call to concatenate 
(( arrays ) [, gl ueaxi s=0] ). The arrays provided are “glued together” along the 
specified axis. The arrays can have any size along axis glueaxis, but their sizes 
along all other axes must match. 

The function indices(shape) provides a tuple of “index arrays” of the given shape. 
The tuple has one element for each axis of shape, and the nth tuple corresponds to 
the nth axis. Each tuple element is an array of the specified shape, such that each 
entry’s value is equal to the index of its nth element. Confused? Here is an example: 

>>> Coords=Numeric.indices(2,3) # a 2x3 box 
>>> Coords[0] # First coordinates for each element 
[[0,0,0,] 

[1,1,1,]] 

>>> Coords[l] # Second coordinates for each element 
[[0,1,2,] 

[0,1,2,]] 

>>> Coords[0] [1 ] [2] # Whafs the fi rst coordinate of (1,2)? 

1 

>>> Coords[ 1] [1] [2] # Whafs the second coordinate of (1,2)? 

2 


Seeding arrays with functions 

You can create an array from the output of an arbitrary function. The function 
fromf uncti on (Generator, Shape ) creates an array of the specified shape. The 
value stored in each array element is produced by a single call to Generator. The 
arguments passed to Generator are the contents of i ndi ces (Shape ), as shown in 
the following example: 
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>>> Numeri c . frotnf uncti on (1 atnbda X,Y: X+Y , (3,3)) 

[[ 0 , 1 , 2 ,] 

[1,2,3,] 

[2,3,4,]] 

When coding a call to fromfuncti on, one can often ignore the fact that Generator is 
acting on arrays, and rely on elementwise array operations to do the work. 

However, keep in mind that some operations (such as comparison) do not work 
well with arrays. The example shown in Listing 32-3 calls the universal function 
Numeri c .mi nimum, because the built-in function mi n does not work on arrays. This 
example prints, for each array entry, the remainder obtained by dividing the entry’s 
two coordinates. Listing 32-4 shows the script’s output. 


Listing 32-3: Remainder.py 


import Numeric 


def Remainder(X,Y): 

# Avoid di vi sion by 0 by adding 1 to the coordinates: 


X=X+1 


Y=Y+1 


Small=Numeric.minimum(X,Y) 
Large=Numeric.maximum(X,Y) 
return (Large%Smal1) 

print Numeric.fromfunction(Remainder,(25,25)) 


Listing 32-4: Remainder.py output 


[[ 0 0 0 0 0 
[00101 
[01012 
[00101 
[01210 
[ 0 0 0 2 1 

[01132 
[ 0 0 2 0 3 

[01014 
[ 0 0 1 2 0 4 

[012315 
[ 0 0 0 0 2 0 

[011131 
[ 0 0 2 2 4 2 

[ 0 1 0 3 0 3 

[001014 
[012125 
[ 0 0 0 2 3 0 


0 0 0 0 0 0 0 

0 10 10 10 
0 12 0 12 0 

2 3 0 1 2 3 0 

1 2 3 4 0 1 2 

0 1 2 3 4 5 0 

1 0 1 2 3 4 5 

10 12 3 4 

2 10 12 3 

3 2 10 12 

4 3 2 1 0 1 

5 4 3 2 1 0 

6 5 4 3 2 1 

0 6 5 4 3 2 

1 7 6 5 4 3 

2 0 7 6 5 4 

3 1 8 7 6 5 

4 2 0 8 7 6 


0 0 0 0 0 0 

10 10 10 
12 0 12 0 
1 2 3 0 1 2 

3 4 0 1 2 3 

1 2 3 4 5 0 

6 0 1 2 3 4 

5 6 7 0 1 2 

4 5 6 7 8 0 1 

3 4 5 6 7 8 9 

2 3 4 5 6 7 8 

1 2 3 4 5 6 7 

0 1 2 3 4 5 6 

1 0 1 2 3 4 5 

2 10 12 3 4 

3 2 10 12 3 

4 3 2 1 0 1 2 

5 4 3 2 1 0 1 


0 0 0 0 0 0 0 ] 

10 10 10 1 ] 

1 2 0 1 2 0 1 ] 

3 0 1 2 3 0 1 ] 

4 0 1 2 3 4 0 ] 

1 2 3 4 5 0 1 ] 

6 0 1 2 3 4 ] 

4 5 6 7 0 1 ] 

2 3 4 5 6 7 ] 

0 1 2 3 4 5 ] 

9 10 0 1 2 3 ] 

8 9 10 11 0 1 ] 

7 8 9 10 11 12 ] 

6 7 8 9 10 11 ] 

5 6 7 8 9 10 ] 

4 5 6 7 8 9 ] 

3 4 5 6 7 8 ] 

2 3 4 5 6 7 ] 


Contined 
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Listing 32-4 (continued) 

[ 0 

1 

1 

3 

4 

1 

5 

3 

1 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

1 

2 

3 

4 

5 

6 ] 

[ 0 

0 

2 

0 

0 

2 

6 

4 

2 

0 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

1 

2 

3 

4 

5 ] 

[ 0 

1 

0 

1 

1 

3 

0 

5 

3 

1 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

1 

2 

3 

4 ] 

[ 0 

0 

1 

2 

2 

4 

1 

6 

4 

2 

0 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

1 

2 

3 ] 

[ 0 

1 

2 

3 

3 

5 

2 

7 

5 

3 

1 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

1 

2 ] 

[ 0 

0 

0 

0 

4 

0 

3 

0 

6 

4 

2 

0 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

1 ] 

[ 0 

1 

1 

1 

0 

1 

4 

1 

7 

5 

3 

1 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 ]] 


Using Element Types 

Array elements can have one of several types. Each type has a type code, a single 
character that uniquely identifies it. The Numeric module provides constants for 
most type codes. These constants do not vary by platform, although the corre- 
sponding character may. 

Type codes can be used as arguments to the array constructor; they can also be 
retrieved from an array by calling its type code method, as shown in the followlng 
example: 

>>> Word=Numeric.array("B1ancmange" ) # An array of characters 
>>> Word 

[B,l ,a,n,c,tn,a,n,g,e,] 

>>> Word.typecode() # Characters have typecode "c" 

' c' 

>>> Word=Nutneric.array("Blancrriange" .Numeric.Int) 

>>> Word # By overriding typecode, we made an array of ints: 

[ 66,108, 97,110, 99,109, 97,110,103,101,] 

The most common typecodes are the numeric ones: Int, FI oat, and Compl ex. In 
addition, these numeric typecodes have sized variants. For example, I ntl6 is (usu- 
ally) a 16-bit integer. If the operating system does not provide 16-bit integers, then 
Int 16 is the smallest integer type whose size is at least 16 bits. The typecodes 
IntO, Ints, Intl6, Int32, and (on some platforms) Int64 and Intl28 are all avail- 
able. Analogous typecodes exist for FI oat and Compl ex (for example, FI oat32). 

The other available typecodes are Unsi gnedIntS (for numbers between 0 and 255), 
and PyObject (for arrays of Python objects). 


Reshaping and Resizing Arrays 

The array attribute shape holds an array’s current shape as a tuple. The function 
reshape(01dArray,Shape) returns an array with the specified shape. No data is 
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copied — the new array holds references to the values in OldArray. The new shape 
must have the same size as the old: 


>>> Shapely=Nurrieric.array((1,2,3,4,5,6)) 
>>> Shapely.shape 
( 6 , ) 

>>> Numeric.reshape(Shapely,(2,3)) 
[[1,2,3,] 

[4,5,6,]] 


A one-dimensional version of any contiguous array is always available as the mem- 
ber f 1 at; an array’s total size is always equal to len(ArrayName.flat). 


The function resize(01dArray,Shape) also returns an array with a new shape — 
however, the new shape need not be the same size as the old. The old array will be 
repeated or truncated as necessary to fili the new shape. The new array is a copy; it 
does not hold references to the original data: 


>>> 

Numeric. 

. resize(Shapely, 

(3,3)) 

[[1 

,2,3,] 



[4 

,5,6,] 



[1 

,2,3,]] 



>>> 

Numeric. 

.resize(Shapely, 

(2,2)) 

[[1 

,2,] 


[3 

,4,]] 




Using Other Array Functions 

In addition to the universal functions previously described, the Numeric module 
provides several other array-manipulation functions. The following sections 
describe some of the most useful ones. 


sort(array,[axis=-1]) 

This function returns a copy of the given array, sorted along the given axis: 

>>> People 
array([[6, 7, 2], 

[8, 3, 5], 

[1, 9, 4]]) 

>>> Numeric.sort(People,0) 
array([[l, 3, 2], 

[6, 7, 4], 

[8, 9, 5]]) 

>>> Numeric.sort(People , 1) 
array([[2, 6, 7], 

[3, 5, 8], 

[1, 4, 9]]) 
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where(condition,X,Y) 

The where function treats the array condition as a mask for creating a new array. It 
returns an array of the same shape and size as condition. Each element of the new 
array is either X or Y. The new array element is X if the corresponding element of 
condition is true; it is Y if the corresponding element of condition is false: 


Checke 

rbo 

ard= 

^Numeric.resize((0,1), (5 

Checke 

rbo 

ard 



ay([[0. 

1, 

0, 

1, 

0], 

[1, 

0, 

1, 

0, 

1], 

[0, 

1, 

0, 

1, 

0], 

[1, 

0, 

1, 

0, 

1], 

[0, 

1, 

0, 

1, 

0]]) 

Numeri 

c .w 

here 

(Checkerboard,"Y","N") 

ay([[N, 

Y, 

N, 

Y, 

N], 

[Y, 

N, 

Y, 

N, 

Y], 

[N, 

Y, 

N, 

Y, 

N], 

[Y, 

N, 

Y, 

N, 

Y], 

[N, 

Y, 

N, 

Y, 

N]] , 'c' ) 


swapaxes(array,axis1 ,axis2) 

This returns a new array that shares the data of the old, but with the specified axes 
swapped. This is different from a call to reshape — it actually transposes an array: 

>>> TwoByThree=Numeric.array([[1,2,3],[4,5,6]]) 

>>> ThreeByTwo=Numeric.swapaxes(TwoByThree,0,1) 

>>> ThreeByTwo 
array([[1, 4], 

[2, 5], 

[3, 6]]) 

>>> ThreeByTwo[2][l]==TwoByThree[l][2] 

1 

>>> Numeric.reshape(TwoByThree, (3,2)) # Different! 
array([[1, 2], 

[3, 4], 

[5, 6]]) 


Matrix operations 

The function matrixmul ti ply(A, B) performs matrix (not elementwise!) multipli- 
cation on A and B and returns the resuit. The function d o t m ( A, B ) returns the dot 
product of two arrays. 


The optional LinearAlgebra module provides several linear algebra functions that 
operate on arrays. These include determinant(a),inverse(a),eigenvalues(a), 
and sol ve_l i near_equati ons ( a , b ). This example multiplies two matrices: 
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>>> Matri x=Numeri c.array([[l,2,3],[4,5,6],[7,8,10]]) 

>>> Inv=LinearAlgebra.inverse(Matrix) 

>>> Numeric.matrixmultiply(Matrix,Inv) 

array([[ 1.OOOOOOOOe+000, 8.88178420e-016, -4.44089210e-016], 

[ O.OOOOOOOOe+000, 1.OOOOOOOOe+000, -1.77635684e-015], 

[ O.OOOOOOOOe+000, 0.OOOOOOOOe+000, 1.OOOOOOOOe+000]]) 

Because LinearAlgebra does its work using floating-point numbers, multiplying the 
matrix by its inverse does not yield the identity matrix exactly; however, the error 
is extremely tiny. Note that L i nearAlgebra. inverse will happily try (and fail!) to 
provide an inverse for a non-invertible matrix. 


Array Example: Analyzing Price Trends 

The script in Listing 32-5 uses an array of imaginary stock prices to compute mov- 
ing averages. A moving average is a computation, for each day, of the average stock 
price for the last few days. The moving average can “smooth out” volatile changes 
in a stock price to a greater or lesser extent. For example, a five-day moving average 
is a relatively short-term measurement, whereas a 200-day moving average takes a 
more long-term view. 

Technical analysts use moving averages to help decide how to trade everything 
from stocks to pork bellies. This script will probably never beat the market, but it 
illustrates how easy it is to do number crunching with array functions. Listing 32-6 
shows the scripfs output. 


Listing 32-5: MovingAverage.py 


import Numeric 

Prices=Numeric.array([10,12,15,18,20,22,22,19,20, 

23,24,28,30,25,23,20,18,15, 

13,8,7,7,8,]) 

# NB: The MA (Masked Array) provides an average function. 

# Since it's a oneliner, we defi ne it ourselves here: 

def Average(Array): 

return Numeric.add.reduce(Array)/float(len(Array.flat)) 

def ProduceMovi ngAverage(StockPrices,Days ): 

SI ices=[] 

for LastDay in range(1,1 en(StockPrices)): 

SI iceStart=max(0,LastDay-Days ) 
Slices.append(StockPrices[SliceStart:LastDay]) 
return map(Average, Slices) 


Continued 
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Listing 32-5 (continued) 


# When the 5-day average crosses above the ll-day average, 

# it may be a good idea to buy the stock. When the 5-day 

# average drops below the llday average, it may be a good 

# time to sell. (The correct day-lengths to use, 

# and the effectiveness of the strategy, vary widely between 

# markets) 

FiveDay=ProduceMovingAverage(Prices , 5) 

EI evenDay = ProduceMovingAverage(Prices , 11) 
pri nt Numeric.greater(FiveDay,ElevenDay) 


Listing 32-6: MovingAverage output 

[ 0000011111111111100000 ] 


Summary 

NumPy’s arrays are fast and flexible; and they mesh well with Python’s Standard 
structures, such as lists and tuples. If you need to handle many numbers at once, 
arrays are probably a good choice — especially if efficiency is important. In this 
chapter, you: 

Created, resized, and manipulated arrays of hundreds of numbers. 

-f Discovered magic squares and magic cubes. 

Analyzed the stock market with moving averages. 

In the next chapter, youTl examine Python’s parsing, tokenizing, and reflection 
capabilities. 






Parsing and 
Interpreting 
Python Code 




> ♦ ♦ ♦ 

In This Chapter 

Examining 

tracebacks 


P ython provides powerful introspection features — even 
more powerful with the addition of function attributes 
in Version 2.1. With programmatic access to the Python inter- 
preter’s parser and disassembler, documentation, debugging, 
and development become much easier. 


Introspection 

Checking indentation 

Tokenizing Python 
code 


Examining Tracebacks 

If your program throws an uncaught exception, it exits, and 
the Python interpreter prints a traceback, or stack trace. 
However, your program need not crash to use traceback 
objects — the traceback module provides a suite of functions 
to work with them. 

One usually grabs a traceback with a call tosys.exc_info(), 
which returns a tuple of the form (Exception,Exception, 
Traceback). In an Interactive session, meeting an unhandled 
exception populates the values sys . 1 astjype, 
sys . 1 ast_val ue, and sys . 1 ast_traceback; one often 
makes use of these with a call to pdb . ptn(). 

Cross- A See Chapter 27, on debugging, for more Information about 
Referen^ how to use tracebacks with pdb. 

Printing a traceback-print_exc and 
friends 

The function pri nt_exc ( [1 i mi t[, f i 1 e] ]) prints a trace¬ 
back for the most recent exception (as stored in 
sys . exc_i nf o( )). The optional parameter limit provides an 
upper limit on how many stack frames to print. Normally, the 


Exomple: syntox- 
highlighting printer 

Inspecting Python 
porse trees 

Low-level object 
creotion 

Disossembling Python 
code 

♦ ♦ ♦ ♦ 



606 Part V > Advanced Python Programming 


exception is printed tosys.stderr. Passing a file parameter causes the exception 
to be printed to a file. 

You can also call pri nt_l ast ([, 1 i mi t[, f i 1 e] ]) to print the traceback for the 
last uncaught exception in an interpreter session. A more general function is 
pri nt_excepti on (type, val ue , tracebackf, 1 imi t[, f i 1 e] ] ), which prints the 
specified traceback for the specif ied exception. A call to p r i n t_t b(traceback 
[, 1 imi t [, f i 1 e] ]) prints just a traceback (without exception info). 

Extracting and formatting exceptions 

The function extract_stack grabs a stack trace from the current stack frame. The 
function extract_tb( tracebackf, 1 imi t ] ) grabs a stack trace from the specified 
traceback. The return value of each function takes the form of a list of tuples. Each 
element corresponds to a stack frame — the last element is the current stack frame. 
Each element is a 4-tuple of the form (Filename, LineNumber, FunctionName, 
LineText). For instance, this (excessively) recursive code (Listings 33-1 and 33-2) 
prints a stack trace: 


Listing 33-1 : StackPrintpy 


import traceback 

def Factorial (n ): 
if (n<2): 

print traceback.extract_stack() 
return 1 

return n*Factorial(n-1) 
print Factorial(3) 


Listing 33-2: Output of StackPrintpy 


[ (' C: WStackPri nt. py' , 9, 'print Factori al (3)'), 

('C:WStackPrint.py',7, 'Factorial', 'return n*Factorial(n- 

D' ), 

('C:WStackPrint.py', 7, 'Factorial', 'return n*Factorial(n- 

D' ), 

('C:WStackPrint.py', 5, 'Factorial', 'print 
traceback.extract_stack()')] 

6 


You can format traceback tuples however you want. To use the Standard format¬ 
ting, call f ormat_excepti on(Type,Value,StackTrace[,limit]).A formatted 
exception is a list of one or more newline-terminated strings. You can format just a 
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stacktrace with fortnat_l ist(Stacl<Trace),or just an exception with 
fortnat_excepti on_only (Type, Val ue ). As ashortcut, call format_tb 
(traceback[ , 1 imi t] ) to format a traceback directly, or call format_stack to 
format the current call stack. 

Caution |f optimization (the -0 switch) is active, the line numbers reported by a traceback 
may be slightiy off. The function tb_l i neno (Traceback ) computes the actual 
line number for a traceback. 


Example: reporting exceptions in a GUI 

Normally, printing tracebacks to a log file is sufficient. However, when debugging a 
GUI, it can be nice to see the traceback onscreen. The code in Listing 33-3 shows a 
simple way to report exceptions in a Tkinter window. 


Listing 33-3: GUlErrors.py 


import Tkinter 
import traceback 
import sys 

def LogError(): 

TBStrings=traceback.format_exception(*sys.exc_i nfo()) 
for Line in TBStrings: 

T raceText.insert(Tkinter.END,Line) 

def DoBadThings(): 
try: 

smurflicious # bogus name 
except: 

LogError() 

root=Tkinter.Tk() 

TraceText=Tkinter.Text(root) 

T raceText.pack() 

BadButton=Tkinter.Button(root,text="DoBadThings", 
command=DoBadThings) 

BadButton.pack() 
root.mainloop() 


Eating arbitrary exceptions is bad for you 

Code that catches an exception and does nothing (the except: pass pattern) is 
sometimes said to “swallow” the exception. This is often sensible. For example, a 
call to 0 s . m kd i rs raises an exception if the directory already exists. This 0S E r ro r 
is eminently edible. On the other hand, when one catches an arbitrary exception, 
it’s best not to swallow it. Unforeseen problems may remain lurking in the program. 
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For instance, the exception could be a Na me Error due to a typo in your code. 
Perform some minimal error handling, even in a quick-and-dirty script such as the 
following: 

try: 

DoLotsOfStuff() # This should never fail. 

except: 

# Oh no! I don't know what to do. But I'd better 

# not just pass, or debugging wi11 hurt. 

traceback.print_exc() 

The time you spend typing that last line is your insurance against long, distracting 
interludes spent debugging. 


Introspection 

Omphaloskepsis is a fancy word meaning “contemplating one’s navel.” The pro¬ 
gramming equivalent, introspection (also called reflectiori), is a fancy word for code 
that can examine itself. With Python, you can programmatically browse information 
such as function and class definitions. It is a handy way to generate documentation, 
perform type checking, and more. 

Review: basic introspection 

The built-in function hasattrfObject, MemberName) returns true if an object has 
a member with the specified name. The function 

getattr(Object,AttributeName[,Default]) returns the specified object mem¬ 
ber, or Defaultit the object has no such member. And the function di r (Object) 
returns a list of member names for an arbitrary object. 

For example, suppose the Master object has various members. Some of the mem- 
bers should be explicitly cleaned up (with a call to the cl eanup method). The fol¬ 
lowing code would clean up each member: 

for Entry in dir(MainApp): 

if hasattr(Entry,"cleanup"): 
getattr(Entry,"cleanup")() 

The built-in function issubclass(Child,Parent) returns true if Child is a sub- 
class of Parent. A class is considered a subclass of itself. The function 
isinstance(Object,ClassOrType) returns true if the specified object is an 
instance of the specified class, or has the specified type. 

For example, a commonly used pattern is to check whether a variable A is a string 
by testing type (X )==type( ""). The problem with this is that Amay be a Unicode 
string! The following function is a better test for most purposes: 
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def IsString(X): 

return (i sinstance(X,types.StringType) or 
i sinstance(X,types.UnicodeType)) 

Browsing classes 

The module pycl br provides a PYthon CLass BRowser (hence the name). It 
browses Python source code directly—therefore, it can browse a module without 
importing it, but it can’t browse a C extension module. The main function 
readmodul e( Modul eNatne[, Path ]) parses the classes in the specified module file. 
The optional parameter Path is a list of directories to add to the module search 
path sys.path. The return value of readmodule is a dictionary, where each key is a 
class name, and each value is a dass descriptor. 

A class descriptor has several data members. The members name, modul e, fi 1 e, 
and 1 i neno provide the class name, module name, module file name, and definition 
line number, respectively. The following examines the FTP class from ftplib: 

>>> FTPDescriptor=pyclbr.readmodule("ftplib")["FTP"] 

>>> FTPDescriptor.name 
' FTP' 

>>> FTPDescriptor.1 ineno 
75 

The member methods is a dictionary, mapping the name of each method to the line 
number on which it is defined. The member super is a list of class descriptors for 
the class’s baseclasses; super has length 1 for single inheritance. If readmodule 
doesn’t have a class descriptor for a base class, the corresponding entry in super 
is the base class name (as a string) instead. 

Browsing function information 

A function (or method) has attributes. Several built-in attributes are available for 
every function, as shown in Table 33-1. 


Table 33-1 

Built-in Function Attributes 

Name 

Description 

func_name 

Funetion name (as a string) 

fune doc 

Funetion's doestring; same as the_doe_member 

fune dict 

Dietionary of user-defined attribute names and values 

fune globais 

Global namespaee of the funetion; same as m._diet_, where m is the 

module defining the funetion 


Continued 
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Table 33-1 (continued) 

Name 

Description 

func_defaults 

Default function parameters, as a tuple 

func_code 

The function, as a code object; suitable for passing to exec or eval 

func_defaults 

Default parameters 


You can also set arbitrary attributos on any Python function (but not on a built-in 
function). 

New \ Function attributos are a new feature in Python 2.1 
Feature 


For example, the function in Listing 33-4 checks a Software version number (as a 
string) to ensure that it is a valid dotted-decimal. It uses function attributes to track 
the number of calls and the number of successos. Listing 33-5 shows the output. 


Listing 33-4: FunctionAttributes.py 


import re 

DottedDecimalRegExp=re.compile(r"''[0-9]+(\.[0-9]+)*$") 

def CheckVersionNutnberf Str): 

# One way to handle function attributes is to assume 

# they are uni nitialized unti1 proven otherwise: 

OldCount = getattrfCheckVersionNumber,"Cal1Count",0) 
CheckVersionNumber.Cal1Count = OldCount+1 
i f (DottedDecimalRegExp.search(Str)): 

CheckVersi onNumber.SuccessCount+=l 
return 1 
return 0 

# One way to handle function attributes is to 

# initialize them up-front. (Unlike this example, 

# you will want to choose one pattern and stick with it) 

CheckVersionNumber.SuccessCount=l 

print CheckVersionNumberC"3.5") 
print CheckVersionNumberC "2") 
print CheckVersionNumber("3.4.5.") 
print CheckVersionNumber("35.") 

print "Total cal 1 sCheckVersionNumber.Cal1Count 

print "Valid version numbersCheckVersionNumber.SuccessCount 
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Listing 33-5: Output of FunctionAttributes.py 


0 

0 

0 

1 

Total calls: 4 

Valid version nutnbers: 2 


Checking Indentation 

The module tabnanny is a safeguard against ambiguous indentation in Python 
code. To quote the docstring: “The Tab Nanny despises ambiguous indentation. 
She knows no mercy.” Run the module from the command line to check a file. For 
example, suppose you created a source file in which one line is indented with tabs, 
and another is indented with spaces. (This sort of mismatched whitespace usually 
happens when people with different text editors are sharing and editing the same 
source files.) The Tab Nanny will not be pleased: 

> tabnanny.py -v parsing.py 

' parsing.py': *** Line 8: trouble in tab city! *** 

offending line: ' print "testing!"\012' 

indent not equal e.g. at tab sizes 1, 2, 3, 4, 5, 6, 7 


Tokenizing Python Code 

Parsing source code can be a bit of a chore. Fortunately, Python’s Standard libraries 
can parse code for you. 

The function tokenize.tokenize(Readline[,Processor]) reads from an input 
stream, tokenizes code, and passes each token along to a processor. The Readline 
parameter is generally the readline method of a filelike object. It should return 
one line of input per call, and return an empty string when no data remains. The 
Processor parameter is called once for each token, and passed a tuple of the form 
(TokenType, TokenString, (StartRow,StartColumn), (EndRow,EndColumn), 
LineNumber). Here, TokenType is a numeric code, and TokenString is the token 
itself. LineNumber \s the logical line where the token began. The default processor 
prints out the token informationi 

>>> Code=StringlO.StringI0("str = 'hi there'") 

>>> tokenize.tokenize(Code.readli ne) 

1,0-1,3: NAME 'str' 

1.3- 1,4: OP 

1.4- 1,14: STPING "'hi there"' 

2,0-2,0: ENDMAPKEP " 
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The token module provides various token-type constants (such as STRING and 
ENDMARKER, as shown in the preceding printout). it provides a dictionary, 
tok_narrie, which maps from token types to token-name strings. it also provides the 
function ISEOF(TokenType), which returns true if the token is an end-of-file 
marker. The tokeni ze module exports all of the TokenType constants of token, as 
well as one additional one: COMMENT (the TokenType of a Python comment). 

A useful parsing-related module is keyword. It provides one function, 
i skeyword (str), which returns true if str is a Python keyword. 


Example: Syntax-Highlighting Printer 

Listing 33-6 uses the tokenizer to provide a syntax-highlighted HTML version of 
Python source code. It uses the keyword module to look up Python keywords. 


Listing 33-6: SyntaxHighlighter.py 


import tokenize 
import egi 
import keyword 

KEYW0RD="Keyword" 

# Use a dictionary to keep track of what HTML tags we 

# will put before and after each token. 

TOKEN_START_HTML={tokenize.NAME:"<font color=BLUE>", 

tokenize.COMMENT:"<font coior=RED>", 
tokenize.STRING:"<font color=GREEN>", 
KEYWORD:"<font coior=0RANGE>", 

} 

TOKEN_END_HTML={tokenize.NAME:"</font>", 

tokenize.COMMENT:"</font>", 
tokenize.STRING:"</font>", 

KEYWORD:"</font>" , 

1 


class SyntaxHighlighter : 

def _init_(self,Input,Output): 

self.Input=Input 
self.0utput=0utput 
self.01dColumn=0 
self.01dRow=0 

def ProcessToken(self,TokenType,TokenString,StartTuple, 
EndTuple.LineNumber): 

# If this token starts after the last one ended, 

# then maintain the whitespace: 

if StartTuple[0]>self.01dRow: 
self.01dColumn=0 
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Whitespace = " "*( StartTupl e[ 1]-sel f. 01 dCol utnn) 
self.Output.write(Whitespace) 

# Special case: Variable names and Python keywords 

# both have token type NAME, but we'd 1 ike the keywords 

# to Show up in a different color. So, we switch the 

# token type to suit our needs: 

if (TokenType==tokenize.NAME and 

keyword.iskeyword(TokenString)): 

TokenType=KEYWORD 

# Pretoken tags: 

PreToken = TOKEN_START_HTML.get(TokenType,"") 
self.Output.write(PreToken) 

# The token itself: 

self.Output.write(cgi.escape(TokenString)) 

# Posttoken tags: 

PostToken = T0KEN_END_HTML.get(TokenType) 
self.Output.write(PostToken) 

# Track where this token ended: 
self.01dRow=EndTuple[0] 
self.01dColutnn = EndTuple[l] 

def PrintHighlightedCodeC sel f): 

self.Output.write("<HTML><PRE>") 
tokenize.tokenize(self.Input.readline, 
self.ProcessToken) 
self.Output.write("</PRE></HTML>") 

Input=open("SyntaxHighlight.py","r") # highlight ourself! 
Output=open("SyntaxHighlight.html","w") 
Highlighter=SyntaxHighlighter(Input, Output) 

Hi ghl i ghter.PrintHighlightedCode() 


Inspecting Python Parse Trees 


When Python code is parsed, it is stored internally in an Abstract Syntax Tree 
(AST). The parser module provides you with access to AST objects. You can con- 
vert back and forth between sequences and AST objects, in order to manipulate an 
expression. 



Manipulating ASTs is not for the faint of heart —they are low-level beasts that may 
vary from one release of Python to the next. 


Creating an AST 

The function parser.expr(source) parses the provided expression, and returns 

the resulting AST. It parses a single expression, in the same way that 

compi 1 e (source, "file.py", " e va 1") would. The function 

parser. sui te( source ) parses a suite of statements, in the same way that 
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cotnpi 1 e (source, " f i 1 e . py", "exec" ) would. Both functions raise a 
ParserError if they cannot parse the code. 

ASTs and sequences 

The AST method totuple([LineInfo]) returns a tuple representation of the AST. 
The tuple contains many deeply nested subtuples. Each tuple is either a terminal 
element (a token) or a nonterminal element (a Symbol). 

Each terminal element of the source is represented by a tuple of the form 
(TokenType,TokenString[,LineNumber]). Here, LineNumber \s provided only if the 
Linelnfo parameter (passed to totupl e) was true. The constants in the token mod¬ 
ule provide readable names for terminal element types. 

Each nonterminal element of the source is represented by a tuple of the form 
(SymbolType,SubElement[,SubElement...]). Here, SymbolType is one of the Symbol 
constants provided in the Symbol module, and eacb SubElement is a cbild element 
(eitber terminal or nonterminal). 

Similarly, the AST method tolist([LineInfo]) returns a list representation of the 
AST. You can produce an AST from a sequence by calling the function 

sequence2ast(Sequence). 

Using ASTs 

An AST object has several methods. The method i sexpr returns true if the AST 
corresponds to a single expression; conversely, i ssui te returns true if the AST 
corresponds to a block of code. The member cotnpi 1 e([fi 1 ename] ) compiles the 
AST into a code object, suitable for passing to exec (if i ssui te is true) or to eval 
(if i sexpr is true). Tbe dummy file name defaults to <ast>. 


Low-Level Object Creation 

The new module provides functions to create a new instance, class, function, mod¬ 
ule, or method. 

The function instance(Class,Members) creates and returns an instance of Class 

with the specified member dictionary (i.e., the new objecfs_ di ct _attribute will 

be Members'). 

The function i nstancemethodf functi on , i nstance, cl ass ) returns a new 
method object. If instance is none, the new method is an unbound (class) method. 

The function functionCcode, globalsf,na me [,defaults]]) creates a function 
with the specified code (as a code object) and tbe specified globals (as a dictionary). 
If specified, defaults sbould be a tuple of default arguments for tbe function. 
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The function modul e( natne ) creates a new module with the specified name. 

The function cl assob j (natne , BaseCl asses , NatneSpace ) creates a new class with 
the specified name. BaseClasses is a tuple (possibly empty) of base classes, and 
NameSpace is the class’s namespace dictionary. 

The normal way of creating things is usually the right way, but occasionally the low- 
level power of new is useful. For example, suppose that Employee and Person are 
classes with similar data members. You could create a Person from an Employee by 
using new, as shown in Listing 33-7: 


Listing 33-7: UsingNew.py 


itnport new 

class Employee: 
pass 

class Person: 
pass 

Bob=Employee() 

Bob.Name="Bob" 

Bob.SSN="123-45-6789" 

Bob.ManagerName="Earl" 

# Passing Bob._dict_ gives rise to some unnatural behavior 

# later on; passing Bob._di ct_.copyO would be healthier! 

BobThePerson=new.instance(Person,Bob._dict_) 

print BobThePerson.Name 
BobThePerson.SSN="987-65-4321" 
print Bob.SSN # It has changed!!! 


Disassembling Python Code 

Python code is compiled into byte code before execution, for improved efficiency. 
This byte-compiled code is stored on disk in .pyc files. The di s module enables you 
to disassemble and examine this byte code. The main function, dis([Source]), 
disassembles byte code and prints the results. The parameter Source may be a 
function or method, a code object, or a class. The function di stb ([tb] ) disassem¬ 
bles the top function of a traceback object. By default, both di s and di stb disas¬ 
semble the last traceback. 

Each line of output contains the instruction address, the opcode name, the opera- 
tion parameters, and the interpretation of the operation parameters: 
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>>> def Tip(Bi11): 


return Bili * 0.15 
>>> dis.dis(Tip) 


0 SET_LINEN0 


1 


3 SET_LINEN0 
6 L0AD_FAST 
9 L0AD_C0NST 


2 

0 (Bili ) 

1 (0.14999999999999999) 


12 BINARY_MULTIPLY 

13 RETURN_VALUE 

14 L0AD_C0NST 
17 RETURN_VALUE 


0 (None) 


The instructions at 6 and 9 push the two values onto the stack; the instructions at 
12 and 13 multiply them and return the resuit. Notice that the instructions at 14 and 
17 (which return None) will never actually execute. (Tixperiments have been done 
with an optimizing Python compiler; such a compiler might well omit these extrane- 
ous instructions!) 

The attribute di s . opname is a sequence of operation code names; the index of each 
opcode is its byte code. The di s module provides several sequences for keeping 
track of the available opcodes. For example, haslocal is a sequence of byte codes 
that accesses a local variable: 

>>> dis.haslocal 
[124, 125, 126] 

>>> di s.opnameC125] # Look up the opcode for this byte code 

'ST0RE_FAST' 

Consuit the Python documentatlon for a full list of the operation codes and their 
behavior. 


Summary 


When it is feeling introspective, Python can parse itself, compile itself, tokenize 
itself, and even disassemble itself. All this flexibility makes programming Python 
easier. In this chapter, you: 

-f Reported errors in a graphical user interface. 

-f Used function attributes to track some simple statistics. 

Created an HTML page of Python code, complete with syntax highlighting. 

-f Parsed and disassembled source code. 

The next chapter deals with internationalizing applications. This is where Unicode 
starts to really come in handy! 
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Creating 

Woridwide 

Applications 

T he modules covered in this chapter help you create pro- 
grams that are easily adaptable to different languages 
and countries. These tools extract language- and region- 
specific Information so that, without additional programming, 
your program will work well with users who speak different 
languages or have different local customs than your own. 
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Internationalization and 
Localization 


Internationalization is the process by which a program is pre- 
pared for using a different language than that of the program- 
mer. Localization is the process by which an internationalized 
program is adapted to the end-user’s choice of language and 
customs. Together they make up what is known as native lan¬ 
guage support, or NLS. 

Note Due to the annoying length of the words internationaliza- 

tion and localization, a popular abbreviated form is to write 
the first and last letters and place between them the num- 
ber of remaining letters. Thus internationalization becomes 
il8n and localization becomes IlOn. 


Internationalization isn’t usually difficult. If you write your 
program with the idea that you will be running it in different 
languages, then adding internationalization support requires 
little effort. If you are retrofitting an existing application, the 
work isn’t hard but merely tedlous. The internationalization 
techniques in this chapter deal with marking strings in your 
application as ones that need to be translated. Special tools 
then extract these strings and lump them together in a 
human-readable file that you pass to a translator. 
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With a little effort, localization can happen almost automatically. Given the file con- 
taining translations for the marked strings in your program, Python’s tools will look 
up the translated version before displaying textual messages to the user. 
Addltionally, there are functions that help you format numbers, currencies, dates, 
and so forth, without requiring you to know the different formats for every single 
region in the world. Each set of region-speclfic settings is known as a locale, and 
there are pre-built librarles of common locales throughout the world. 

Note Python's native language support routines are largely based on GNU's native lan- 

guage support project. Visit the gettext section on www .gnu.org for interesting 
links and more information. 


Preparing Applications for Multiple 
Languages 

This section walks you through the process of preparing a tiny program for using 
different languages. For a real application, you’ll follow these steps in a different 
order, but the order given here is better for a first-time look at the process. At first, 
it may seem like a lot of work, but after youVe been through it ali once, you’ll see 
that it’s actually quite simple. 

An NLS example 

Not ali strings in an application need to be localizable (translatable). File names, 
development error messages, and other strings that aren’t visible to the user can 
remain in your native language. Mark the strings that do need to be translated by 
sending them to a dummy function named _(s ): 

def _(s): return s 

print _('What do you want to do today?') 
print '1 _('Bake something' ) 

print 'Z _('Play with food') 

i = raw_input( _('Enter 1 or 2: ')) 
if i == ' 1' : 

print _('0h boy! Baking!') 
el se: 

pri nt _('Food is fun!' ) 

The function name can be anything, but the single underscore character is the con- 
ventional choice because it doesn’t pollute your source code too much, it doesnT 
take too much extra effort to include it, and it’s very unlikely that you’re already 
using a function of the same name. Moreover, some Processing tools may be expect- 
ing that you follow the herd and use the same convention. 
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In Python’s Tools/il8n directory lives pygettext.py,a tool that extracts strings 
tagged for translation and places them into a human-readable file. Using the preced- 
ing example program (saved as chef. py), you extract the tagged strings as follows: 

d:\Python20\Tools\i 18n\pygettext.py chef.py 

Normally, you won’t see any output from running this (unless you use -h for help), 
but it generates amessages.pot file such as the following: 

# SOME DESCRIPTIVE TITLE. 

# Copyright (C) YEAR ORGANIZATION 

# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR. 

# 

msgid "" 
msgstr "" 

"Project-Id-Version: PACKAGE VERSIONXn" 

"PO-Revision-Date: Wed Feb 14 20:31:20 2001\n" 

"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" 

"Language-Team: LANGUAGE <LL@1i.org>\n" 

"MIME-Version: 1.0\n" 

"Content-Type: text/plain; charset=CHARSET\n" 
"Content-Transfer-Encoding: ENCODINGXn" 

"Generated-By: pygettext.py l.l\n" 

#: chef.py:5 

msgid "What do you want to do today?" 
msgstr "" 

#: chef.py:6 

msgid "Bake something" 

msgstr "" 

#: chef.py:7 

msgid "Play with food" 

msgstr "" 

#: chef.py:10 

msgid "Oh boy! Baking!" 

msgstr "" 

#: chef.py:12 
msgid "Food is fun!" 
msgstr "" 

#: chef.py:8 

msgid "Enter 1 or 2: " 

msgstr "" 

This template file can then be copied and edited to form a language-specific ver- 
sion. For the following example, we downloaded an echeferizer, a program that 
translates text into the language spoken by the Swedish chef from the Muppets. 
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Tip 


I took tnessages . pot, added translations, and saved it as messages . po (the follow- 
ing text shows only the lines that changed): 

#: chef.py:5 

msgid "What do you want to do today?" 
tnsgstr "Vhet du yuoo vunt tu du tudey?" 

#: chef.py:6 

msgid "Bake something" 

msgstr "Beke-a sumetheeng" 

#: chef.py:7 

msgid "Play with food" 

msgstr "Pley veet fuud" 

#: chef.pyilO 

msgid "Oh boy! Baking!" 

msgstr "Ooh buy! Bekeeng!" 

#: chef.py:12 

msgid "Food is fun!" 

msgstr "Fuud is foon!" 

#: chef.py:8 

msgid "Enter 1 or 2: " 

msgstr "Inter 1 oor 2: " 

The gettext module understands translation files in the .mo format, so use the 
msgfmt. py tool (also in Python’s Tool s/i 18n directory) to convert from the . po 
format: 

d:\Python20\Tools\i 18n\msgfmt.py messages.po 

Once again, no output message means success, although you should now find a 
messages . mo file in the current directory. Make a directory off your current direc¬ 
tory called chef, and in it create another directory called LC_MESSAGES. Now move 
messages.po into that LC_MESSAGES directory (ITl explainwhy in a minute). 

The final step is to replace the underscore function with a translator function from 
Python’s gettext module. (Of course, you could have skipped using the dummy 
function altogether and used gettext from the get-go, but I wanted to keep it sim¬ 
ple.) Replace the old underscore function with the following: 

import gettext 

_ = gettext.transiation('messagesgettext 

Instead of using the translation object's gettext method, you can use ugettext 
^ to have it return the string as a Unicode string. 

Back on the command line, set the envlronment variable LANGUAGE to chef and run 
the program: 
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Tip 


C:\tenip>set LANGUAGE=chef 
C: \tetnp>chef. py 

Vhet du yuoo vunt tu du tudey? 

1 - Beke-a sumetheeng 
Z - Pley veet fuud 
Inter 1 oor 2: 1 
Ooh buy! Bekeeng! 

What it all means 

Now that youVe seen an example, you can better understand the process. The 
underscore function acts as a lookup function that receives an original string and 
returns a translated string. The work of extracting the strings, translating them, and 
converting the file to the . po format is pretty straightforward. (Python uses the . po 
format because thafs what GNU uses and there are third-party tools that use the 
same format.) 

Thegettext.transiation(dotnain[, localdirf, languagesf, class]]]) 

function returns an instance of the Translation class that handles lookups for 
you. domai n is useful if you want to group strings by module or category, and 
1 ocal di r is the base path from which to search for translation files (if omitted, it 
looks in the default system locale directory). If the languages parameter is omit¬ 
ted, the function searches through the envlronment variables LANGUAGE, LC_ALL, 
LC_MESSAGES, and LANG to decide which language to use. cl ass lets you supply 
your own class to parse the translation file; if omitted, the GNUTransl ati ons class 
is used. 

gettext. i nstal 1 (dotnai n[, 1 ocal di r[, uni code] ])) installs the under- 
^ score function in Python's built-in namespace so that all modules will be able to 
^ access it. Use this oniy when you want to force the entire application to use the 
same language. 

Based on the argument and envlronment Information, gettext looks in 
1 ocal di r/1 anguage/LC_MESSAGES/dotnai n .mo for atranslation file, and opens 
and processes it, although it first passes the 1 anguage to gettext ._expand_l ang 
to get a list of directory names it will check for: 

>>> itnport gettext 

>>> gettext._expand_l ang('french' ) 

['fr_FR.IS08859-l', 'fr_FR', 'fr.IS08859-1', 'fr'] 

>>> gettext._expand_lang('amer i can' ) 

['en_US.IS08859-l', 'en_US', ' en. IS08859-1', 'en'] 

>>> gettext._expand_l ang('chef') 

['chef'] # Unknown locale returned asis 


You could place a single English translation in an en directory so that all English- 
speaking users would get that one translation; or you could provide translations 
that differ for Australia and the United States, for example. 
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When you ship your program, you would include . mo files for each language you 
wish to support. Based on the user’s environment variables, your program automat- 
ically dlsplays itself in the correct language. 

/Note The gettext module also has a set of APIs that closely mirror the GNU C APIs, 

^ but using the class-based APIs discussed in this section is the method of choice; 

it's flexible and much more Pythonic. For example, you can create your own trans- 
lation class, and you can localize each module separately, instead of the entire 
application. 


Formatting Locale-Specific Output 

The locale module helps you localize program output by formatting numbers and 
strings according to the rules of an end-user’s locale. The following sectlons show 
you how to query and set various propertles of the current locale. 

Changing the locale 

The default locale is called the C locale, but you can change the locale with 
setlocale(category[, value]). Each locale is a set of rules for formatting cur- 
rencies, dates, and so on, and you can use the category argument to specify what 
part of the locale you want to switch. Table 34-1 llsts the different categories you 
can use. If val ue is omitted, the current locale for the given category is returned. 



Table 34-1 

Locale Categories 

Category 

Affects rules dealing with... 

LC_ALL 

AII subcategories 

LC_TIME 

Time formatting 

LC_MESSAGES 

Operating system-generated messages 

LC_NUMERIC 

Number formatting 

LC_MONETARY 

Currency formatting 

LC_COLLATE 

String sorting 

LC_CTYPE 

Character functions 
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In general, you switch all categories at the same time: 

>>> import locale 

>>> locale.setlocaledocale. LC_ALL, ' german ' ) 

’ Gertnan_Gertnany . 1252' 

Calling setl ocal e with an empty string for the val ue argument switches to the 
user’s default locale (which is discovered by looking in environment variables such 
as LANGUAGE, LC_ALL, LC_CTYPE, and LANG or by querying the operating system). 

Note Many users set their locale in site, py, which is loaded when Python starts up, so 

--—' before setting the locale, you shouid first verify that it isn't aiready something other 

than the default C locale. 

setlocaleis not generally thread-safe, so if you do call it, be sure to do so near the 
beginning of the program if possible. Programs running in embedded Python inter- 
preters shouid not set the locale, but if the embedding application sets the locale 
before the interpreter starts, Python will use the new locale setting. 

Locale-specific formatting 

str (f) formats a floating-point number using the user’s locale settings to decide 
what decimal character to use: 

>>> import locale 

>>> locale.setlocaledocale. LC_ALL, ' german ' ) 

'German_Germany.1252' 

>>> 1ocale.str(5.21) 

'5,21' 

The formati format, val [, groupi ng]) function formats a number just as the 
normal % operator would, except that it also takes into account the user’s numeri- 
cal separator characters. If groupi ng is 1 instead of the default of 0, a grouping 
character (such as a thousand’s separator) is used: 

>>> locale.formatl'%5.2f',12345.23) 

'12345,23' 

>>> locale.formatl'%5.2f',12345.23,1) 

'12.345,23' 

atof (str) and atoi (str) convert a string to a floating-point number or integer, 
taking into account the user’s grouping and decimal characters. The following uses 
the preceding locale settings: 

>>> locale.atofCl.000.002,5' ) 

1000002.5 

strcolHsl, s2) compares two strings using the lexicographic rules of the user’s 
locale: 
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>>> locale.setlocaleClocale.LC_ALL,'us') 

’ English_United States.1252' 

>>> 1 ocal e. strcol 1 ( ' chutnp ' , ' coward ' ) # 'ch' < 'co' in English 
-1 

>>> locale.setlocaledocale. LC_ALL, ' sp' ) 

' Spanish_Spain.1252' 

>>> 1 ocal e. strcol 1 (' chutnpcoward ’ ) # In Spanish, 'ch' > 'c' 

1 

In order to compare strings using non-native lexicographic rules, strcol 1 first 
transforms the strings in such a way that a normal string compare yields the cor- 
rect resuit. If you will be performing many comparisons of the same string (sorting, 
for example), you can instead call strxf rm( s ) to get the transformed format. This 
would calculate it only once, after which you can use Python’s normal compar¬ 
isons, such as cmp and the equality operators. 

Properties of locales 

Each locale has a set of attributes describing its various rules. The localeconvf) 
function returns a dictionary containing the rules for the current locale. The keys of 
this dictionary and their meanings are listed in Table 34-2. 


Table 34-2 

Keys for the 1oca1econv Dictionary 

Key 

Meaning 

U.S. English 
Example 

decimal_point 

Decimal-point character 


mon_decimal_point 

Monetary decimal point 


thousands_sep 

Number grouping character 

/ 

mon_thousands_sep 

Monetary grouping character 

, 

currency_symbol 

Local currency Symbol 

$ 

int_curr_symbol 

International currency Symbol 

USD 

positive_sign 

Sign for positive money values 

<blank> 

negative_sign 

Sign for negative money values 

- 

f r a c_digit s 

Number of fractional digits used in local 
monetary values 

2 

int_frac_digits 

Number of fractional digits used in 
international values 

2 

p_cs_precedes 

1 if currency Symbol precedes value for 
positive monetary values 

1 
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Key 

Meaning 

U.S. English 
Example 

n_cs_precedes 

1 if currency Symbol precedes value for 
negative values 

1 

p_sep_by_space 

1 if space between positive value and 
currency Symbol 

0 

n_sep_by_space 

1 if space between positive value and currency 
Symbol for negative values 

0 

p_sign_posn 

Sign position, positive money values 

3 

n_sign_posn 

Sign position, negative money values 

0 

grouping 

List of separator positions 

[3,0] 

mon_group1ng 

List of separator positions, for monetary values 

[3,0] 


For p_si gn_posn and n_si gn_posn, avalue of 0 means that the currency and the 
value are enclosed in parentheses; 1 means that the sign comes before the value 
and the currency Symbol; and 2 means that the sign follows the value and the cur¬ 
rency Symbol. A value of 3 means that the sign immediately precedes the value, and 
4 means that the sign immediately follows the value. A value of LC_MAX means noth- 
ing is specified for this locale. 

The groupi ng and mon_groupi ng attrlbutes have lists of numbers specifying the 
posltions where “thousands” (numerical grouping) separators should be put. If the 
last entry is CHAR_MAX, no further grouping is performed after the next-to-last posi- 
tion has been used. If the last entry is 0, the last group is repeated, so [3, 0] means 
place the separator character every three digits. 


Summary 

Addlng native language support to your application makes it possible for your pro- 
grams to adapt themselves to the locale of the end-user, without requlring you to 
know the customs of every single region in the world. In this chapter, you: 

4 Flagged translatable strings in your program and extracted them with 
Python’s tools. 

4 Created a translation table for your application and ran it in a different language. 
4 Formatted numeric output according to the rules of the end-user’s locale. 

The next chapter shows you how to take control of and modify the Standard mod¬ 
ule import behavior. 






Customizing 
Import Behavior 



I n most cases, the normal behavior for importing modules 
is just what you need: You give Python a module name and 
it finds and loads the module code and adds a new module 
object to the current namespace. Occasionally, however, you 
may need to change the way the import process works. This 
chapter covers the several mechanisms Python provides for 
easily creating custom module import behavior. 


> ♦ ♦ ♦ 

In This Chapter 

Understanding 
module importing 

Finding and looding 
modules with imp 

Importing encrypted 
modules 


Understanding Module Importing 

When the Python interpreter processes the import state- 

ment, it calls the function_ import_(name[, gl obal s [, 

localsf, fromlist]]]), which in turn locates the module 
called name, retrleving its byte code so that a new module 
object can be created. The gl obal s and 1 ocal s parameters 

hold the global and local dictionaries so that_ i mport _can 

determine the context in which the import is taking place, 
f roml i st is a list of items to import from the module when 
the from x i mport y form of i mport is used. 


Cross- Y Chapter 6 describes modules, packages, and the import 
Referej^ statement. 

The primary reason_ i mport _exists in Python (as opposed 

to being accessible only via the Python/C API) is so that you 
can modify or track module imports. For example, the follow- 
ing code replaces the normal importer with a function that 
informs you of each module being loaded: 


oldimp = _import_ # Save a reference to the 

original 


Retrieving modules 
from o remote source 

> > ♦ ♦ 
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Tip 


def newitnp( name , gl obal s=None , locals=None, f rotnl i st=None): 

# Display info about the import request 
if not fromlist: 

print ':: import',name 
el se: 

print fromnameimport.join(fromli st) 

# Now call the original function 

return ol dimp(name,globals,locals,fromli st) 

_builtins_._import_ = newimp 

After running the preceding code, you can see that import calls are indeed routed 
to the new function, including imports that other modules request on their own: 

>>> import os 

: : import os 

>>> os = reload(os) 

: : import sys 
: : from nt import * 

:: from nt import _exit 
: : import ntpath 
:: import UserDict 

The knee module in the Standard Python distribution is an example of replacing 

the built-in _i mport_ function. It doesn't add new functionality, but it is usefui 

for seeing how things work. 

Another use of _i mport_is to modify the module before returning it to the caller. 

For example, the following code adds a timestamp to each module, marking when it 
was originally loaded: 

import sys, time 

oldimp = _import_ 

def newimp(name, globals=None, locals=None, fromlist=None): 
try: 

mod = sys .modulesfname] 
first_load = mod.first_load 
except (AttributeError, KeyError): 
first_load = time.time() 

mod = oldimp(name,globals,1ocals,fromli st) 
mod.first_load = first_load 
return mod 

_bui1tins . import 


newimp 
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The module maintains its original timestamp, even if reloaded: 


>>> import md5 
>>> md5.first_load 
982444108.24399996 
>>> md5 = reload(md5) 

>>> md5.first_load 
982444108.24399996 

/Note Some modules will have aiready been loaded by the time your import hook is 

' called, so they won't have a timestamp uniess they are loaded again later. 


Instead of completely replacing Python’s import behavior, other modules let you 
replace or extend only parts of it. The following sections cover the i mp and 
imputil modules. 



The i hooks module is another way to modify module import behavior; it is cur- 
rently used by rexec (restricted execution). New programs shouid avoid using 
i hooks, and use i mp and i mputi 1 instead. 


Finding and Loading Modules with imp 

The i mp module gives you access to some of the behind-the-scenes functionality 
associated with module importing. lt’s useful if you’re creating your own module 
importer or working with Python module files. 

Each byte-compiled (.pyc) file has a special header identifying it as Python byte- 
code; this header can vary from one version of Python to the next to signify a 
change in bytecode format. get_magi c() returns the header for the current 
version: 

>>> import imp 
>>> imp.get_magic() 

’\207\306\015\012' 

get_suf f i xes () returns a list of module suffixes that Python uses when searching 
for modules. The list contains tuples of the form (suffi X, mode, type): 

>>> imp.get_suffixes() 

[('.pyd', 'rb', 3), ('.dll', 'rb', 3), ('.py', 'r', 1), 

(' . pyc ' , ’ rb ' , 2)] 

The mode telis what mode shouid be passed to the open function to read the file 
contents, and type telis the type of the module, i mp defines a variable to name 
each type, as listed in Table 35-1. 
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Table 35-1 

Module Type Values 

Value 

Type Name 

Module is... 

1 

PY_S0URCE 

Source code 

2 

PY_COMPILED 

Bytecode 

3 

C_EXTENSION 

Dynamically-loaded C extension 

4 

PY_RESOURCE 

Source code as a program resource (Mac) 

5 

PKG_DIRECTORY 

A package directory 

6 

C_BUILTIN 

Statically-linked C extension 

7 

PY_ER0ZEN 

Bytecode generated by the Freeze utility (see Chapter 36) 

8 

PY_CODERESOURCE 

Bytecode as a program resource (Mac) 


fi nd_rriodul e (narrie[, pathlist]) locates a module with the given name or raises 
ImportError ifit can’t find the module, pathlist is a list of directories in which 
fi nd_rriodul e will look, returning the first match it can find. If you don’t supply a 
list of paths, fi nd_modul e first checks to see if the module exlsts as a built-in or 
frozen module. Next, it searches in special platform-specific locations (the System 
registry on Windows and as a program resource on Macintosh). Finally, it will look 
through the paths listed insys.path. When searching for a module, fi nd_tTiodul e 
finds files that have the same name as the name argument and that have any of the 
extensions in the list returned by get_suf f i xes. 

The value returned from fi nd_modul e is a 3-tuple of the form (file, path , 
descriptioni, file is an open file object for the module file (ready for reading the 
file contents), path is the full path to the file on disk, and deseri pti on is a tuple 
like the ones get_suff i xes uses: 

>>> imp.fi nd_module('asynchat' ) 

(<open file 'D:\Py20\lib\asynchat.py', mode 'r' at 0172E900>, 
’D:\\Python20\\lib\\asynchat.py', 

(’.py', 'r’, 1)1 # 1 is PY_S0URCE 

If the module lsn’t a file on disk, the file and path are empty: 

>>> imp.find_module('md5' ) 

(None, 'md5', C, ", 6)1 # 6 is C_BUILTIN 

Note that fi nd_modul e doesnT handle hierarchical names; locating such modules 
is a multi-step process: 
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>>> i mp . f i nd_niodul e (' wxPython ' ) 

(None, 'D:\\Python20\\wxPython ' , 5)) 

>>> i tnp . f i nd_tnodul e ('wx',['d:\\python20\\wx Python’]) 

(<open file 'd:\python20\wxPython\wx.py', mode 'r' at 
017D07C8>, 'd:\\python20\\wxPython\\wx.py ' , ( ' . py ' , 'r', 1)) 

1 oad_modul e (name, file, filename, deseri pt i on) loads the module called 
name (reloading it if it was already loaded). The fi 1 e, fi 1 ename, and deseri pti on 
arguments are the same as the values returned from f i nd_modul e, but name is the 
full module name (for example, wxPython.wx). 1 oad_modul e returns a module 
object or raises ImportError. 

Note 1 oad_modul e does not close the file object after it reads in the module. Be sure 

to close it yourself, especially if the load faiis and an exception is raised. 

You can create a new, empty module obj ect bycallingnew_module(name).The 
module object returned is not inserted into sys .modul es and has two members: 

_name_ (set to the name value passed in to new_modul e) and_doct_ (set to 

the empty string). 


Importing Encrypted Modules 

The i mputi 1 module makes it easy to modify importing behavior while reusing as 
much of the current import functionality as possible (so you don’t have to rewrite 
the whole thing yourself). This section uses i mputi 1 to read Python modules 
stored in an encrypted format. 


Tip importers.py (in Python's Demo/imputil directory) contains examples of 

A using i mputi 1 in different ways. 

ImportManager is a class in i mputi 1 that takes care of locating and loading Python 
modules. The i nstal 1 ([namespace] ) method installs the ImportManager 

instance into the given namespace dictionary, defaulting to_ bui 1 ti n _so that all 

modules use it (namespace can be a module or a module dictionary): 


>>> import imputi1 

>>> im = imputi1.ImportManagerC) 

>>> im.instal1() 



As of Python 2.0, i mputi 1 and the PythonWin IDE have problems working 
together. Try the examples of this section from a different IDE or from the com- 
mand line. 



The ImportManager constructor can optionally take an instance of the 
i mputi 1 . Importer class; see the next section for detaiis. 
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Once the ImportManager is installed, you can add to its list of recognized suffixes 
for Python modules by calling its add_suffi x( suffix, i mportFunc ) method. 
When the i mport statement is used, the ImportManager searches through known 
module locations (for example, sys . path) for files that have the requested module 
name and an extenslon that matches one in ImportManager ’s internal suffix list. 
When found, it calls the i mportFunc to import that module. 

The code in Listing 35-1 puts the ImportManager to work by adding the new file 
suf f i X. pye, which for now will contain only normal Python source code (in a later 
example, it will contain encrypted bytecode). Basically, no functionality is added, 
except that you can now store Python code in .pye files. 


Listing 35-1 : importpye.py - Adds.pye as valid Python 
module files 


import imputil 

def handle_pye(fui 1path, fileinfo, name): 

# Print a debugging message 

print 'Importing "%s" from "%s"' % (name,fui 1path) 

data = open(fui 1path).read() 

return 0, compi1 e(data,fui 1path,'exec'),{j 

im = imputi1.ImportManager() 
im.add_suffix('.pye',handle_pye) 
im.instal1() 


Now create a . pye Python module. For example, save the following code to a file 
called stuff. pye: 

print 'I am being imported!' 
a = 10 
b = ' Flel 1 0 ’ 

After importing i mportpye, any other module can automatically import . pye 
modules: 

>>> import stuff # This fails - doesn't check .pye files yet 
Traceback (most recent call last): 

File "<stdin>", line 1, in ? 

ImportError: No module named stuff 
>>> import importpye 

>>> import stuff # Now .pye files are checked and loaded 

Importing "stuff" from "stuff.pye" 

I am being imported! 

>>> stuff.a, stuff.b 
(10, 'Helio') 
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The i tnportFunc passed to add_suf f i x takes three arguments: the full path to the 
module file, a file Information tuple (from a call to os . stat), and the name of the 
module being imported. If the functlon doesn’t return a value, ImportManager con¬ 
tinues looking in other locations and with other suffixes until it loads a module, 
finally raislng ImportError if unsuccessful, so your importFunc could choose to 
ignore some import requests because the ImportManager will continue looking if 
needed. 

Your importFunc should either not return anything or return a 3-tuple ( i sPkg, 
code, i ni ti al Di ct). i sPkg is 1 if the module is actually a package directory, code 
is a code object for the module (which will be executed in the namespace of the 
new module), and i ni ti al Di ct is a dictionary containing any initial values you 
want present in the new module’s dictionary before the code object is executed. 

With the import hook working, you can add in support to decrypt the module as it 
is being imported. Listing 35-2 expands the previous version of i mportpye. py to 
decrypt the file contents before returning it to the ImportManager. It also adds a 
utility function, encrypt, to take a . py file and create a . pye file containing com- 
piled and encrypted bytecode. 


Listing 35-2: importpye.py - Imports-encrypted Python 
modules 


import imputil, rotor, os, marshal 

SECRET_CODE = 'bitakhon' 

rot = rotor.newrotorCSECRET_CODE) 

def encrypt(name): 

# Compiles and encrypts a Python file 

data = compi1e(open(name).read(), name, 'exec') 

base, ext = os . path.splitext(name) 
data = rot.encrypt(marshal.dumps(data)) 
open(base+'.pye', 'wb').write(data) 

def handle_pye(fui 1path, fileinfo, name): 

# Print a debugging message 

print 'Importing "%s" from "%s"' % (name,fui 1path) 

data = marshal.1oad(fui 1 path) 
return 0, rot.decrypt(data),{) 

im = imputi1.ImportManager() 
im.add_suffix('.pye',handle_pye) 
im.instal1() 
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To test it, rename stuff. pye to just stuf f. py (or use any other Python source 
file) and use the encrypt function to create a . pye file: 

>>> import itnportpye 

>>> importpye.encryptf'stuff.py' ) 

Now you can distribute the stuf f. pye file, and programs can load it without need- 
ing to handle the details of decryption: 

>>> import importpye 
>>> import stuff 
I am being imported! 

>>> stuff.a, stuff.b 
(10, 'Helio') 

With a little extra work, you can use this method to distribute Python modules 
whose contents are relatively secure. Using the Python/C API, you can create a 
small C program that embeds the Python interpreter and takes care of setting up 
the rotor (or whatever other decryption engine you use) so that it’s not overly triv- 
ial for someone else to decrypt the files. Furthermore, by not advertising the fact 
that your program is actually Python, and by grouping all the modules together into 
a single archive file (perhaps as a pickled dictlonary), you can prevent all but the 
nosiest of people from obtaining your program source. 

Cross- A Chapters 29 and 30 cover extending and embedding Python with C, and Chapter 
Ref erence y ]2 teaches you how to serialize Python objects using the pi ckl e and marshal 
modules. 


Retrieving Modules from a Remote Source 

The imputil .Importer class is a base class from which you derive custom import 
subclasses. In this section, you’ll create a subclass that retrieves Python modules 
from a remote module reposltory. 

Subclassing Importer 

Most subclasses of Importer overrlde only one method, get_code(parent, 
name, fqnamel.If not None, parent is a parent module in a module hlerarchy. na me 
is the name of the module, and fqname is the fully qualified name (from the root of 
the module namespace down to this module). 

If get_code can’t find the module or doesn’t want to handle the request, it shouldn’t 
return anything. If it does load the module, the return value should be a 3-tuple of 
the form ( i sPkg , code , i niti al Di ct ), as with the i mportFunc in the previous 
section. 
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The easiest way to use an Importer is to add it to sys . path. Normally, sys . path 
holds directory names, but with the ImportManager installed, it can contain direc- 
tory names or Importers. Listing 35-3 creates a dummy Importer and installs it. 


Listing 35-3: dumbimp.py - A dummy custom Importer 


import imputil, sys 

# Create an install the ImportManager 

ier = imputi1.ImportManager() 
ier.instal1() 

class DummyImp(imputil.Importer): 
def get_code(sel f, *args): 
print ’ Importing',args 

# Install at the front of the list 

sys.path.insert(0,imputi 1 .Bui 1 ti nImporter()) 
sys . path.insert(0,DummyImp()) 

# Test it 

import Tkinter 


Running the program yields the following output: 


C:\temp>dumbimp.py 
Importing (None, 'Tkinter 
Importing (None, 

Importing (None, 

Importing (None, 

Importing (None, 

Importing (None, 

Importing (None, 


'Tkinter' ) 

FixTk', 'FixTk') # Indirect imports 

_tkinter’, ’_tkinter') 
types', 'types') 

Tkconstants', 'Tkconstants') 
string', 'string') 

MacOS', 'MacOS') 


Right behind the new importer is also an instance of Bui 1 ti n Importer to handle 
normal imports. When downloading modules from a remote source, the custom 
importer should probably come last in the list so that all other importing tech- 
niques are exhausted before an attempt is made to download it over the relatively 
slow network connection. 


Creating the remote Importer 

The server side of the network connection is as simple as possible: it accepts 
incoming connections, reads a request for a single module, and returns the Python 
source code or an empty string if the module doesnT exist on the remote side. In 
real-world applications, it’s a good idea to add securlty, message compression, the 
ability to handle multiple requests on a single Socket, and lots of error checking. 
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Listing 35-4 shows the remote importer implementation. 


Listing 35-4: rimp.py - Remote module importer 


import struet, SocketServer, imp, imputil 
frotn Socket import * 

# Simple message layer - adds 1ength prefix to each 

# message so remote side knows how much data to read 

MSGJDR = ' ! r 

MSG_HDR_LEN = struet. cal esi ze (MSGJDR) 

def MsgSend(sock, msg): 

'Sends a message with a 1ength prefix' 

# Add 1ength prefix 

msg = struet.pack(MSG_HDR, len(msg)) + msg 

# Send until all is sent 

while msg: 

count = sock.send(msg) 
if count > 0: 

msg = msgCcount:] 

def MsgRecv(sock): 

'Reads and returns a message' 

# Read the prefix 

pre = sock.recv(MSG_HDR_LEN) 
if not pre: 
return 

count = struet.unpack(MSG_HDR, pre)[0] 

# Read the message 

msg = ' ' 
while 1: 

leftToRead = count - len(msg) 
if not leftToRead: 
break 

msg += sock. recv(1eftToRead) 
return msg 

# Server side 

FORT = 55555 
ADDRESS = '127.0.0.1' 

class ImportHandler(SocketServer.BaseRequestHandler): 
def handle(self): 

print 'Received new connection' 

msg = MsgRecv(self.request) 

print 'Remote side requests module',msg 
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file = None 
try: 

file, natne, info = itnp. f i nd_tnodul e(tnsg) 
source = fi 1 e.read() 
except ImportError: 

source = '' 
if file: 

file.close() 

print 'Sending %d bytes' % len(source) 

MsgSend(self.request, source) 
print 'Done' 

def StartServerC): 

print '[Starting server]' 

serverClass = SocketServer.ThreadingTCPServer 
1 istenAddress = (ADDRESS, PORT) 

serverClass(1 istenAddress, ImportHandler).serve_forever() 

# Client si de 

cl ass Remote Importer!imputil.Importer): 

def get_code(self, parent, name, fqname): 

print 'Checking remote host for module',name 
s = socket(AF_INET, SOCK_STREAM) 
s.connect!(ADDRESS, PORT)) 

MsgSendCs, name) 
code = MsgRecv(s) 
if not code: 
return 

# Save the module for next time 

open(name+'.py','wt').write(code) 
print 'Saved %s.py to disk' % name 

# Now return the code for this time 

return 0, compi1 e(code, name+'.py', 'exec'), {) 

if _name_ == '_main_': 

StartServer!) 
el se: 

# The module is being imported, so install the 

# custom importer 

import imputil, sys 

# Install an ImportManager only if one has not 

# already been installed globally 

if _import_._name_ != '_import_hook': 

ier = imputi1.ImportManager() 
i er.instal1() 

sys.path.append(imputil.BuiltinImporter()) 

# Install it at the end of the list 

sys.path.append(Remoteimporter()) 
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Chapter 15 covers sockets and SocketServers. 


The first part of the program creates a simple messaging layer that adds a message 
length prefix to messages so that the receiving side knows how many bytes to read. 

The server side of the importer subclasses SocketServer. BaseRequestHandl er 
to repeatedly receive a request, find it with i mp . f i nd_rriodul e, and send back the 
Python source code (because bytecode mlght be incompatible if the Client and 
server sides have different versions of Python). The server llstens on a local 
address so that you can run both sides of the example on a single computer. 

The Client side connects to the server, sends a request, and reads a response. If the 
server sends back an empty string, it couldnT find the module either, but if found, 
the Client side writes the module to disk so that future imports won’t require the 
network transfer. 

Depending on how the module is loaded (as a standalone program or imported by 
another module), the ri mp module starts the listening server or installs the custom 
importer. 

Testing the remote Importer 

To see the remote importer work, first run it as a standalone program in a directory 
that contains at least one other module (I ran it in the directory that had the 
stuff. py module from previous sections.): 

C:\temp>rimp.py 
[Starting server] 

Now copy rimp. py to another directory (so that the “client” side doesn’t have 
access to the same modules) and start up a Python interpreter. Import ri mp and 
then import the module that the server side has: 

>>> import rimp 
>>> import stuff 

Checking remote host for module stuff 
Saved stuff.py to disk 
I am being imported! 

The server side shows that it processed the request successfully: 
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Received new connection 

Remote side requests module stuff 

Sending 49 bytes 

Done 

Now try a module that doesn’t exist anywhere: 

>>> import borkborkbork 

Checking remote host for module borkborkbork 
Traceback (most recent call last); 

File "<stdin>", line 1, in ? 

File "D:\Python20\lib\imputil.py", line 91, in _import_hook 
raise ImportError, 'No module named ' + fqname 
ImportError: No module named borkborkbork 

The normal (and correct!) ImportError is raised, even though the server tried to 
locate the module on its side: 

Received new connection 

Remote side requests module borkborkbork 

Sending 0 bytes 

Done 

Finally, look in the client-side directory and note that the module that was trans- 
ferred successfully has been cached so that next time no network transfer will be 
needed: 

C:\temp\t>dir /b 
rimp.py 
rimp.pyc 

stuff.py # Yay! 

In addition to the enhancements mentioned earlier, a more useful solution might 
include versioning Information so that the Client automatically gets newer versions 
from the server as needed. 

The nicest part about tbe import hooks discussed in this chapter is that nothing 
needs to change in any other modules in order for them to work. Only the initial 
startup module needs to install the hooks; all other modules are completely 
unaware that a module is being decrypted or transferred halfway around the world 
via the Internet. 
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Summary 

In this chapter, you: 

-f Learned how to replace the normal Python import function with a custom 
function. 

Created an import function to handle encrypted modules. 

> Retrieved Python modules from a remote server via a network connection. 

The next chapter covers Python’s module and application distribution tools and 
describes how you can bundle your entire program into a standalone executable 
that Works even if users don’t already have Python installed. 

♦ -f 


Distributing 
Modules and 
Applications 

O nce youVe created your Python masterpiece, how do 
you get it into users’ hands? This chapter answers that 
question by introducing di st uti 1 s —the tools you use to 
distribute individual modules or entire applications. 

Instead of providing an exhaustive and tedious review of the 
di stuti 1 s package, in writing this chapter I tried to focus on 
what you need to know for 95 percent of the situations you 
might encounter when distributing Python applications. Rest 
assured, however, that the Standard Python documentation 
probably lists a special option or feature to cover each case in 
the obscure 5 percent, and if anything is missing beyond that, 
you can customize and extend the tools even further. 



> ♦ ♦ ♦ 

In This Chapter 

Understanding 

distutiis 

Other distutiis 
features 

Distributing extension 
modules 

Creating source and 
binary distributions 

Building standalone 
executables 

♦ ♦ ♦ ♦ 


Understanding distutiis 

The distutiis package was introduced in Python 1.6 to stan- 
dardize the process of building and installing third-party 
Python libraries. 

The main work when using distutiis is creating the setup 
script, which, by convention, is called setup. py. This small 
Python program describes to di stuti 1 s the files that need to 
be in the distribution and gives additional information like 
version numbers, author name, and so on. 

The setup script telis distutiis to bundle the necessary files 
(which might be Python code, C source files, or other data 
files) and generate whatever kind of distribution package you 
want. Your distribution type can range from an ordinary ZIP 
file to a full-blown Linux RPM or Windows installer. 
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Creating a simple distribution 

The following is a simple example so you can see di stuti 1 s in action. Listing 36-1 
shows a small example library that I want to make available to other people. 


Listing 36-1 : timeutil.py - Time Utilities to 
Be Packaged by distutiis 


import time as _time 
def _getnow(): 

'Returns current time tuple' 

return _time.1ocaltime(_time.time()) 

def time(): 

'Returns current time as string' 

return _time.strftime('%I:%M %p',_getnow()) 

def date(): 

'Returns current date as string' 

return _time.strftime('%b %d, %Y',_getnow()) 


Wlth my appllcation ready, it’s time to create the setup script, shown in Listing 36-2. 


Listing 36-2: setup.py - Setup Script for timeutii Distribution 


from distuti1s.core import setup 

setup(name='timeuti1 ' , 
version='0.9', 
author = 'pokey ' , 

author_email = 'pokey@yellow5.com', 
uri = 'www.yellow5.com/pokey', 
py_modules = ['timeutii']) 
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The Setup script is very simple — it imports d i s t u t i 1 s and calls the s e t u p func- 
tion with a bunch of keyword arguments that give basic information about your 
Software (other Standard arguments you can use include mai ntai ner, 
mai ntai ner_etnai 1, 1 i cense, deseri pti on, and 1 ong_descri pti on). 

The py_tnodul es argument names a list of Python modules to include in the distri- 
bution; this simple example has only one. You can speclfy modules that are part of 
apackageas ’ packagenatne .tnodul enatne ' (this assumes that 

packagenatne/_i ni t_. py really exists) or as files in other directories as 

'directory/tnodulenatne'. 

Note The Setup script is meant to be cross-platform compatible, so always use forward 
(UNlX-style) siashes in directory names —distuti 1 s takes care of converting 
them as needed on each different platform. 

Keep in mind that the setup script is just a Python program, so any valid Python 
code Works in your setup script. When you run the setup script you supply a com- 
mand argument telling di stuti 1 s what you want it to do. In this case 1 want to cre¬ 
ate a Windows installer, so 1 run the command like this: 

C:\tetnp>setup.py bdi st_wi ni nst 

This command and others are covered in “Creating Source and Binary 
Distributions” later in this chapter. Assuming all went well, in the di st directory 
you will find a file called timeuti 1 0.9. wi n32. exe (Version 0.9 was the version 1 
chose in the setup script). Because 1 chose a platform-specific distribution format, 
the file name also includes the platform required to run it (Win32). 

Thafs it! My module is now ready for distribution. 

Installing the simple distribution 

Now imagine that you’re one of the lucky few to have gained possession of the pow- 
erful ti me ut i 1 library and that you want to install and begin using it. Running the 
program displays a screen like the one shown in Figure 36-1. 
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Tip 


Setup ti tneutil-0.9 

timeutn-0.9 



Figure 36-1: The main screen of the di stuti 1 s-generated installer for Windows 


To install the timeuti 1 module, click Next a few times; now the module is in a loca- 
tion on your system where all Python programs can find it. For example, after start- 
ing up a Python interpreter from any directory: 

>>> itnport timeuti 1 

>>> timeuti1.time(), timeutil.date() 

('02:07 PM', 'Feb 19, 2001' ) 

The di stuti 1 s package chooses the correct default location for third-party mod- 
ules based on the current platform. On UNIX, for example, the default directory is 
usually /usr/1 ocal /1 i b/pythonx.y/si te-packages and on Windows, it's 
c: \pythonxy, where x and y are major and minor version numbers. 

Another distribution method is to give the source files and the setup script to the 
user as-is (or in a ZIP file or compressed tarball from which the user first extracts 
the files). The setup script also acts as the installation script. To install the 
timeutil module, you simply run: 


setup.py install 
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Once again, di stuti 1 s installs the module where all programs can find it. 

Tip You shouid also create a file called README or README.txt that gives a brief 

descriptiori of your distributiori and maybe a littie help on how to install it. 
di stuti 1 s automatically includes these README files in the distribution, if present. 

Tip The -horrie=<di r> command-line argument telis the install command to 

place the modules in a different directory than the default. This option can be use- 
fui on Systems where normal users don't have write access to the default directory. 


Other distutiis Features 

As I mentioned before, di stuti 1 s has features to handle just about any sort of situ- 
ation you might encounter. In this section 1 cover a few of the most useful features. 

Distributing packages 

If you install more than one or two modules in the default directory, that directory 
starts to become pretty cluttered. Worse, if you want to uninstall a particular distri¬ 
bution, you have a tough time determining which files go with which third-party 
library (because, by default, they all end up in the same directory). 

A better approach is to distribute your modules as a package (which in turn could 
include other packages too). This method is much more organized and requires 
very littie extra work from you. It is also less prone to errors: di stuti 1 s automati¬ 
cally includes all the Python files that are part of a package so you don’t have to list 
each file individually. 

So, as an advocate of clean directory structures, suppose I decide to go back and 
distribute my ti me ut i 1 module as a package. In fact, envisionlng it to be part of 
some future suite of Utilities, I rename it to be the dateti me module in the 
daveuti 1 package. The conversion is easy: create a da veuti 1 directory and copy 
ti meuti 1 . py into it, renaming it to dateti me. py. Inside the daveuti 1 directory, I 

create a_ i n i t_. py file (which can simply be empty or contain a comment) to 

identify da veuti 1 as a package. 

Listing 36-3 shows the slightly modified setup script that uses the packages key- 
word argument to list the packages it will include. (Once again, like py_modul es, 
this is a list, so it could include several package names.) 
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Listing 36-3: setup.py - A Setup Script That 
Distributas an Entire Package 


from distuti1s.core import setup 

Setup(name='daveuti1 ' , 
version='0.9’, 
author = ’pokey ’ , 

author_etnai 1 = ' pol<ey@yel 1 ow5 . cotn' , 
uri = ' WWW .yel 1 ow5. cotn/pokey ’ , 
packages = [' daveuti1']) 


Now the resulting distribution from setup . py installs the daveuti 1 package, leav- 
ing the main default install directory clutter free. Users can stili access the new 
daveuti 1 .datet i me module from any program: 

>>> from daveutil import datetime 
>>> datetime.date() 

'Feb 19, 2001' 

The package_di r keyword argument enables you to use a different directory 
scheme if you don’t want to use the default one. Its value is a dictionary whose keys 
are package names and whose values are directory names. To change the directory 
for modules that aren’t part of any package, use a key of an empty strlng. For exam- 
ple, if src is the base directory of all your source code, you could use the following 
portion of a setup script: 

package_dir = {'' : 'src' j 
py_modules = ['modi', 'mod2'] 

This code causes d i s t u t i 1 s to look for the modules src/modl.py and 
src/mod2.py. 

Including other files 

If you need to include additional, non-Python files in your distribution, you can use 
the data_f i 1 es keyword argument to setup: 


data_files = [ ’ dialog . res', 'splash.jpg']. 
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Each item in the list can also be a tuple containing a destination directory name 
and a list of files. For example, to have the installer put di al og . res and 
splash.jpg intothe resource directory, use: 


data_files = [('resource', ['dialog.res', 'splash.jpg'])], 


If you want total control over which files end up in a source distribution, create a 
file called MANIFEST in the same directory as your setup script. The file should con- 
tain one file name per line. If specifying each file is too much of a pain, create a man- 
ifest template file (call it MANI FEST. i n) that di stuti 1 s uses to generate the list of 
files to include. Each line of the file contains a rule describing a group of files. For 
example, to include any text files in the current directory and any Python files in 
the current or child subdirectories that start with ‘d’, the MAN I FEST. i n file looks 
like: 

include *.txt 

recursive-include d*.py 

Table 36-1 lists the rules you can use in the manifest template file. 


Table 36-1 

Manifest Template File Rules 

Rule 

Descriptiori 

Include pl p2 ... 

Include any files matching any of the patterns. 

Recursive-include pl p2 . . . 

Same, but search oniy in child directories. 

Global-include pl p2 . . . 

Same, but search current and child directories. 

Graft di r 

Include all files in di r and its chiidren. 

Exci ude pl p2 ... 

Exclude any files matching any of the patterns. 

Recursive-exclude pl p2 . . . 

Same, but search onIy in child directories. 

Global-exclude pl p2 . . . 

Same, but search current and child directories. 

P r u n e d i r 

Exclude all files in di r and its chiidren. 


Python applies the rules in order, so you can arrange them to specify any list of 
files. In addition to valld file name characters, patterns can include asterisks (*) to 
match any sequence of characters, question marks (?) to match any single charac¬ 
ter, and [range] to match a range of characters, like [a-fO-9] and [b- f ]. 
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#Note As of Python 2.0, the data_f i 1 es argument works oniy with binary distributions 
^ and source distributions that have a manifest file. 

Customizing setup 

Python checks for a file called setup . cfg for additional configuration options. 
These options override any corresponding settings from the setup script, but they 
themselves are overridden by corresponding settings specified on the command 
line. This configuration file is useful if you need to let users customize setup or if 
there are some settings you always need to specify. 

The format of the configuration file is 

[command] 
vari abie=value 

where command is one of the Standard commands like bdi st (for a complete list, 
run setup. py - hei p-commands). Each vari abi e is a setting for that command 
(you can get a list of settings for a command by running setup . py <command> - - 
hei p). To continue a value onto the next line, just indent the next line’s value. 

/Note If the command-line version of a setting has a dash in it, use an underscore char- 

' acter in the configuration file instead. Also, if a setting is normally an "on-off" type 

flag (for example, - -qui et), write it as setti ng=l in the configuration file. 

Some settings you may wish to always use, even across all projects. In this case you 
can create a pydi stuti 1 s .cfg file in the directory specified by sys . pref i x, and 
di stuti 1 s will read settings from it before reading from a projecfs setup .cfg, 
if any. 


Tip On UNIX Systems, each user can also create a . pydi stuti 1 s . cfg file in his or 

her horne directory for user-specific custom settings. 

Distributing Extension Modules 

The d i s t u t i 1 s package doesn’t just work with Python files: it is quite happy to dis¬ 
tribute C extension modules too. Pass the ext_modul es keyword argument to the 
setup function to specify which extensions to include, for example: 


ext_modules = [extl, ext2] 
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Each extension you list is actually an instance of the Extensi on class. Here’s a 
more complete setup script that includes one extension module called trade that 
is built from two source files, st oc k. c and opt i on . c: 

frotn di stuti 1 s . core import setup, Extension # nota bene! 

trade_ext = Extensi on('trade', ['stock.c', 'option.c']) 

setupCname = 'trader', ext_tnodules = [trade_ext]) 

The first argument to the Extensi on constructor is the module name including the 
package name, if any. If you plan on listing several extensions belonglng to the same 
package, you can use the ext_package keyword argument before ext_tnodul es. 

The Extension constructor also takes some optional keyword arguments of its 
own. i ncl ude_di rs is a list of directories in which the compiler should look for 
i ncl ude files, and 1 i brary_di rs is a list of directories to include as link paths. 

1 i brari es is a list of files to include in the link. 

The def i ne_tnacros and undef_tnacros keyword arguments are lists of preproces- 
sor definitions to use when compiling: 

trade_ext = Extension('trade', ['stock.c', 'option.c'], 

def i ne_rriacros = [ (' DEBUG_LOGGING' , None), 

('MAX_C0UNT','100')] 
undef_tnacros=[' TRAGE' ]) 

The preceding code is equivalent to having the following code at the top of every 
source file: 

#define DEBUG_L0GGING 
#define MAX_C0UNT 100 
#undef TRAGE 

See the following section for Information on how C extension modules are handled 
with different distribution types. 


Creating Source and Binary Distributions 

You can create distributions containing just source code or binary distributions 
too. In this section 1 show you how to generate each type of distribution using the 
same setup script so you can easily compare the results. The setup script is as 
follows: 

frotn di stuti 1 s . core import setup, Extension 
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ext = Extension('extextension . c']) 

Setup(name='daveuti 1 ' , 
version='0.9', 
author = 'pokey ' , 

author_etnai 1 = ’ pol<ey@yel 1 ow5 . cotn ' , 
uri = ' WWW .yel 1 ow5. cotn/pokey ’ , 
py_tnodules = ['pytnod'], 
ext_rriodul es = [ext]) 

In the preceding example, extensi on . c is a simple C extension module and 
pymod . py is a small Python file with a single function in it; both files are in the same 
directory as the setup. py listed above. 

Chapters 29 and 30 show you how to create C extension modules in Python. 



Source distributions 

A source distribution contains Python and C source files (no bytecode or compiled 
C files). This type of distribution is the quickest to generate and you can use it on 
any platform. The following command creates a source distribution: 

python setup.py sdist 

The output file ends up in the di st directory, and its default type depends on your 
platform (for example, a ZIP file on Windows). On my machine, the finished file was 
daveuti 10. 9.zip and it contained these files: 

extension.c pytnod.py README.txt 

You can choose the output file type with the - fortnats=fl, fZ, ... argument. Use 
the following command to see the output formats available: 


Users who download your distribution archive use a command similar to the fol¬ 
lowing to install it: 


C:\tetnp>setup .py sdist --hei p-formats 

List of available source distribution formats: 

bzip2'ed tar-file 
gzip'ed tar-file 
uncompressed tar file 
ZIP file 

compressed tar file 


-formats=bztar 
-formats=gztar 
-formats=tar 
-formats=zip 
-formats=ztar 


The availability of different formats also depends on other libraries you have 
installed (such as zl i b for compression). 


setup.py install 
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The Setup script installs the files in the correct place for the end user’s machine, 
and it builds extension modules automatically. 01 course, if the user doesnT have a 
compiler installed, he or she can’t build the C extension modules using this klnd of 
dlstribution. 

Binary distributions 

Binary distributions include the Python source code, byte-compiled versions of 
each file, and the compiled versions of any C extension modules. The C source code 
is not included, making binary distributions suitable for users who don’t have com- 
pllers or in cases where you don’t want to distribute the C source. The drawback is 
that C extension modules you provide in the dlstribution work only on compatible 
platforms, so if you want to make it available on both Windows and Linux plat- 
forms, for example, you need to create two different dlstribution packages. 

Use the following command to create a binary dlstribution: 

Setup.py bdist 

di stuti 1 s kindly builds your extension modules for you and places the compiled 
modules into the archive. On my machine, the finished file was 
daveuti 1 . 0.9. wi n32. zi p and it contalned these files: 

pymod.pyc ext.pyd pymod.py 

Once again, you can use the - -formats and - hei p-fortnats commands to 
choose and list output formats. Users install your dlstribution the same way as 
before, only this time they don’t need a compiler. 

Installers 

One other form of binary dlstribution is an installer, like the one I used in the first 
section of this chapter. These work the same way as normal binary distributions 
except that they have an installation program familiar to users of the target System. 

Most Windows users are familiar with downloading an executable program that 
they run to install the program for them. To create such an executable, run this 
command: 

Setup.py bdist_wininst 

On my computer, this command created the file daveuti 1 - 0.9.win32-py2.0.exe. 
When you run it, you see a few dialogs letting you know what it’s going to install 
and where. 
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Linux folk are used to downloading and installing RPM files, which perform essen- 
tially the same function but without the glitzy user interface. To generate an RPM 
file, use: 

Setup.py bdist_rprri 

You can optionally add a -source only parameter to build just a source RPM or 
- - bi nary-only to build only a binary RPM. RPMs also have a . spec file that 
describes them; di st uti 1 s generates this file automatically for you using the Infor¬ 
mation from the setup script, command-line, and configuration files. You can spec- 
ify other .spec options that aren’t part of a normal Python distribution using the 
parameters listed in Table 36-2. 



Table 36-2 

Linux RPM SPEC Options 

Option 

Meaning 

--distribution-name 

Name of the Linux distribution for this RPM 

- - reiease 

RPM release number 

- - s e r i a 1 

RPM seriai number 

- - vendor 

Vendor or author (defaults to author or maintainer in 

Setup.py) 

- -packager 

RPM packager (defaults to vendor) 

--group 

Package classification (defaults to Development/Libraries) 

- -1con 

Icon file to use 

- -doc-fi 1 es 

Comma-separated list of documentation files 

- -changelog 

Path to RPM change log 

- -provides 

Capabilities provided by this RPM 

- - requi' res 

Capabilities required by this RPM (dependencies) 

--bui1d-requires 

Capabilities required to build the RPM 

--conf1 icts 

Capabilities that conflict with this RPM 

--obsoletes 

Capabilities made obsolete by this RPM 


The type of distribution you choose to create depends on who you think will use it. 
When possible, it doesnT hurt to create several different types so that users can 
choose whichever they find most convenient. 
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Building Standalone Executables 

Despite all the wonderful things about Python, most people do not have it installed 
on their computers. Worse, those that do have it may have a version that conflicts 
with the version you used to create your program. The tools in this section show 
you how to create a self-contained executable that has the Python interpreter, your 
Python modules, and everything else needed to run your program with no other 
dependencies. 

py2exe 

My favorite tool for building standalone Windows appllcations is Thomas Heller’s 
py2exe (avallable at http: //py2exe. sourceforge. net). It extends the di stuti 1 s 
package so it fits in nicely with the topics covered so far in this chapter, and it is very 
simple to use. 

For an example, I’ll use this small program saved as hei 1 o . py: 

import sys 

print sys.version 
print 'Helio!' 

Here’s the setup script, setup . py; 

from distuti1s.core import setup 

import pyZexe 

setup(name='hei 10', Scripts = ['hei 1o .py' ] ) 

The differences are in italic bold: import py2exe before calling setup, list your 
module name in the seri pts list, and include the extension. The command to use 
with the setup script is py2exe: 

setup.py py2exe 

The preceding command creates hello.exe indistXhello. Also in that dlrectory 
is python20 . dl 1 and msvcrt.dll (a supporting library). The program runs like 
any other executable: 

C: \temp\di st\hello>hello 

2.0 (#8, Oct 19 2000, 11:30:05) [MSC 32 bit (Intel)] 

Hei 1 0 ! 
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The py2exe program figures out what other libraries and files it needs to include in 
order to make your program truly self-contained; you can create executables even 
for something complex like a GUI appllcation using wxPython. 

You may be alarmed at the size of the files for the simple hei 1 o program (about 
1 MB). Don't worry—most of that is fixed-sized overhead, so a program with 10 
times as many lines of Python code is stili very small. 

Use Setup . py py2exe - - hei p to see a list of optional arguments you can use on 
the command line or in the setup configuration file. For example, - debug gener¬ 
atos an executable with debug information and -icon enables you to specify an 
icon file that the appllcation should use. - - i ncl udes lets you add other modules to 
those that py2exe detects that it should include, and - -force-imports adds the 
given modules tosys.modules beforeyour script begins to run. 


% 


Tip The current version of py2exe can't detect imports made by calls to the 

_import _ function (as opposed to the import statement), to 

PyItnport_IrriportModule (instead of PyIrriport_Itnport), or to modules 
whose names aren't known untii runtime. Force py2exe to include these modules 
by using the --includes option. For Py Iniport_ItnportModul e calls, you 
should use - -force imports so that the modules will aiready be in sys. 
modul es by the time the C code calls for them. 




Freeze 

The f reeze utility is a nlce alternative for creatlng standalone programs because it 
comes as part of the Standard Python distribution, and it is not limited to Windows 
computers. You do, however, need to have a compiler installed. freeze determines 
the modules your program needs, compiles them to bytecode, and Stores 
(“freezes”) the bytecode in huge C byte arrays. A small embedding appllcation 
starts up a Python interpreter and notifies the import mechanisms of the frozen 
modules it has so that imports don’t require external Python files to be present. 

The f reeze utility predates Python’s di stuti 1 s, so you don’t write a setup script 
like you do for py2exe. Instead, just type: 

python freeze.py hello.py 

Of course, you may have to specify the location of freeze.py. On my FreeBSD Sys¬ 
tem, it lives in / usr/local/1 i b/python2.0/Tool s/freeze. freeze creates a 
bunch of C files and a Makef i 1 e; usually all you need to do now is type make to 
build the executable. 

In order to use freeze, you need to have buiit Python from the source 
distribution. 
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Other tools 

Gordon McMillan has developed a small suite of tools for creating standalone exe- 
cutables for Windows and Linux; you can download them from http : / /tncmi 11 an - 
inc.com. 


Archives 

Archives work like the f reeze utility except that archives store the bytecode in a 
compressed arcbive to take up less space. One nice side effect that archives and 
f reeze executables enjoy is reduced disk 1/0 because all tbe modules are in a sin- 
gle compressed file; these applications tend to load up qulcker because the inter¬ 
preter doesn’t have to hunt through sys . path to locate the modules to load. 

Standalones 

Standalones store the compressed bytecode in an embedding application, but also 
link in as many of tbe binary dependencies as possible so that the resuit is a single 
executable that users can easily run, copy, or delete. 

Installer 

Gordon’s tools also come with a simple installer that generates self-extracting (and 
self-cleaning when finished) installation programs. Once nice feature is that they 
can even detect if they are being run from a read-only media source such as a 
CD-ROM and stili run correctly (using an alternate location for temporary decom- 
pression storage). 

This set of tools is very flexible and has many options to customize its behavior. Its 
different pieces are kept as separate as possible while stili remaining interoperable 
so that you can mix and match (or extend) different pieces to suit your specific 
needs. 


Summary 

Once youVe written your program, you stili have the task of delivering it to your 
users. Fortunately, Python’s di stuti 1 s package makes this process relatively pain- 
less. In this chapter you: 

Created distribution packages tbat automatically install files in tbe correct 
place on end users’ computers. 

-f Built distributions that included just the source code. 

-f Built distributions that included precompiled C extenslon modules. 
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-f Wrapped your applicatiori in an easy-to-use installer for Windows or Linux. 

Created self-contained Windows applications that don’t require a preexisting 
Python installation, or don’t conflict with other versions of Python. 

The next chapter shows you how to make the most of the Windows-specific mod¬ 
ules that come with Python. 


Platform- 
Spedfic Support 


l/JJ 

> ♦ > 

Chapter 37 

Windows 


Chapter 38 

UNIX-Compatible 

Modules 



Windows 



M ost of Python’s libraries are portable. However, some- 
times the need arises to take advantage of OS-specific 
Services, such as the Windows registry. Accordingly, Pytbon’s 
Standard libraries provide some Windows-specific support. In 
addition, the Python Extensions for Windows (win32all) wrap 
most of the Win32 API, so you can do plenty of Windows pro- 
gramming without even having to write a C extension. 


Using win32all 

The Python Extensions for Windows, also known as win32all, 
include wrappers for much of the Windows API. If youVe done 
Windows programmlng before, you should feel right at horne 
with win32all! Currently, win32all is hosted at ActiveState 
(WWW .acti vestate . com), and is part of the ActivePython 
distribution. 

I keep a copy of Visual Studio running when I program with 
win32all so that I can consuit MSDN as needed. The win32all 
package includes some documentation, but at some point 
you’ll probably want to bave a comprehensive reference on 
the win32 API. 


> ♦ ♦ ♦ 
In This Chapter 

Using win32all 

Example: using some 
Windows APls 

Accessi ng the 
Windows registry 

Using msvcrt goodies 

♦ > ♦ > 


Data types 

In places where the Windows API would use a struet, win32all 
often uses a dictionary. The dictionary’s keys are the names of 
the struefs data members; its values are tbe corresponding 
values. For example, the Windows API NetUserGetI nf o can 
return informatlon about a user in the form of a struet: 


typedef struet _USER_INF0_10 { 


LPWSTR 
LPWSTR 
LPWSTR 
LPWSTR 
} USER_INF0 


usri 10_natTie; 
usri 10_cotTitTient; 
usri 10_usr_corrirrient; 
usri 10_ful l_narrie; 
10; 
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When you use win32all to manipulate user info, you use the corresponding 
dictionary: 

>>> win32net.NetUserGetInfo(None,"Administrator”,10) 

{'full_name': u'', 'name': u'Admini strator', 'usr_comment': 
u'', 'comment': u'Built-in account for administering the 
computer/domain' ) 


Error handiing 

The win32api modules translate any API errors into the exception 

win32api .error. This error has amember, args , which takes the form (ErrorCode, 

FunctionName, Info). For example: 


>>> win32net.NetUserGetInfo(Non 
Traceback (innermost last): 

File "<pyshel1#51>", line 1, 
win32net.NetUserGetInfo(Non 
api_error: (2221, ' NetUserGetIn 
found. ’ ) 


e, ’ 

'Doctor 

Frungy" 

,10) 

i n 

7 



e, ’ 

'Doctor 

Frungy" 

,10) 

fo 

', 'The 

user name could not be 


Finding what you need 

The Windows API is a large beast, and could easily fili a book larger than this one. 
And so, finding a function that does what you want can take some sifting. I generally 
search MSDN for online help. The book Programming Windows, by Charles Petzold 
(Microsoft Press 1998), is also an excellent (and readable) reference on the 
Windows API. And if you want to read up on win32all itself — particularly the COM 
extensions —Python Programming on Win32, by Mark Hammond and Andy 
Robinson (0’Reilly and Associates 2000), is a good reference. 

You may discover that win32all does not yet expose the API you want. If so, your 
best recourse is to create a C extension to wrap the API. If you do, the source code 
for win32all is a good reference to borrow ideas from. See Chapter 29 for an intro- 
duction to C extensions. 


Example: Using Some Windows APIs 

Listing 37-1 illustratos some of the AlPIs that win32all provides. The program is a 
simple text editor. It uses some predefined constants from the wi n32con module 
(which provides about 4000 different constants!). It uses Tkinter to put up a simple 
GUI (see Chapter 19 for more Information on Tkinter). And it uses the wi n32hel p 
and wi n32cl i pboard modules, to access the Windows help System, and the 
cllpboard. 
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Listing 37-1 : TextEditor.py 


import Tkinter 
import sys 

import win32help # Launching .hlp files 

import win32con # Constants used by the other win32 modules 
import win32clipboard # Clipboard APIs 

class TextEditor: 

def _init_(self,root): 

self.root=root 

# Create the menus: 

MenuBar=Tkinter.Menu(root) 

Fi 1eMenu=Tkinter.Menu(MenuBar,tearoff=0) 

Fi 1eMenu.add_command(1abel = "Quit",command=sys.exit) 
MenuBar.add_cascade(1 abel = "Fi 1 e",menu=Fi 1 eMenu) 

EditMenu=Tkinter.Menu(MenuBar,tearoff=0) 

EditMenu.add_command( 1 abel="Copy",command=sel f.DoCopy) 
EditMenu.add_command(1abel="Paste", 
command=self.Do Paste) 

MenuBar.add_cascade(1 abel = "Edi t",menu=Edi tMenu) 

HeipMenu=Tkinter.Menu(MenuBar,tearoff=0) 

HeipMenu.add_command(1abel = "Index",command=sel f.DoHel p) 
MenuBar.add_cascade(1abel = "Heip",menu=Hel pMenu) 
root.config(menu=MenuBar) 

# Create the main text window: 

self.TextWindow=Tkinter.Text(root) 
self.TextWindow.pack(expand=Tki nter.YES, 
fi 11=Tkinter.BOTH) 
def DoCopy(self): 

Selection=self.TextWindow.tag_ranges(Tki nter. SEL) 
if 1 en(Selection)>0: 

SelectedText =\ 

self.TextWindow.get(Selection[0],Selection[l]) 

One must open (and lock) the clipboard before 

# using it, then close (and lock) the clipboard 

# afterwards: 

win32clipboard.0penClipboard(0) 

# SetClipboardText is a shortcut for 

# SetClipboardData(test, CF_TEXT): 

wi n32cl i pboard.SetClipboardText(SelectedText) 
win32clipboard.CloseClipboard() 
def DoPasteCself): 

win32clipboard.0penClipboard(0) 

PasteText=win32clipboard.GetClipboardData(\ 
win32con.CF_TEXT) 
win32clipboard.CloseClipboard() 
self.TextWindow.insert(Tkinter.INSERT,PasteText) 
def DoHelp(self): 


Continued 
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Listing 37-1 (continued) 


# win32help includes a single function, WinHelp, that 

# wraps the WinHelp API. Here, we open the help file 

# "Editor.hlp" to its index. 

wi n32hel p.WinHel p(0,"Editor.hlp",win32con.HELP_INDEX) 

# Main code: 

root=Tkinter.Tk() 

TextEditor(root) 
root.mainloop() 


Accessing the Windows Registry 

The Windows registry is a repository of System information. It keeps track of users, 
program settings, port information, and more. The registry takes the form of a tree, 
where each node of the tree is called a key. Each key can have one or more named 
values. Each top-level key is called a hive. The usual way to access the registry by 
hand is by running the program regedi t; another good registry browser is 
regedt32.exe. 


For example, Windows Stores your system’s Internet Explorer version number in 
the value Version in the key So ftwareXM i eroso ft\ Internet Explorer in the 
HKEY_LOCAL_MACHINE hive. 



Breaking the registry can have very weird, very bad effects. Always back up the reg¬ 
istry before running any code that tweaks it. Otherwise, a single typo might break 
your System! 


Accessing the registry with win32all 

To examine an existing key, call wi n32api . RegOpenKeyExf Hi ve , Subkey, 0[, 

Sam] ). Here//me is the key to open; it is generally one of the win32con constants 
HKEY_CLASSES_ROOT, HKEY_CURRENT_USER, HKEY_LOCAL_MACHINE, or 
HKEY_USERS. Subkey is the subkey to open, as a string. And Sam is a comblnation of 
flags indicatlng the level of key access we want. I generally use 
wi n32con . KEY_ALL_ACCESS, but KEY_READ (the default value) is safer if you don’t 
want to risk breaking the registry. Table 37-1 llsts the available access levels. 
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Thefirst argument to RegOpenKeyEx need not be a hive; it can be any registry key 
handie. In this case, the subkey name shouid be the path to the subkey from the 
specified key, instead of from the hive. 


Table 37-1 

Registry Access Constants (from win32con) 

Constant 

Ability Cranted 

KEY_ALL_ACCESS 

Full access 

KEY READ 

Read access 

KEY WRITE 

Write access 

KEY CREATE_LINK 

Create symbolic links 

KEY CREATE SUB KEY 

Create Subkeys (included in KEY_WRITE) 

KEY_EN U MERATE_SU B_KEYS 

Iterate over subkeys (included in KEY_READ) 

KEY EXECUTE 

Read access 

KEY NOTIFY 

Change notification (included in KEY READ) 

KEY QUERY VALUE 

Subkey read access (included in KEY_READ) 

KEY SET VALUE 

Modify subkey values (included in KEY_WRITE) 


A call to RegOpenKeyEx returns a key handie. Once you have this handie, you can 
call RegQueryVal ueEx( KeyHandl e, Name ) to retrieve a key value. Here Name is 
the name of the value (or to query the key’s default/unnamed value). 
RegQueryVal ueEx returns a tuple of the form (Value,ValueType). You can also set 
values by calling RegSetValueExC KeyHandl e,Name,0,ValueType,Value). Here 
ValueType is a constant, indicating the data type of Value. Table 37-2 shows the 
most common value types. 

When you are finished with a registry key, you shouid close it, with a call to 

RegCloseKey(KeyHandl e). 

You can access the registry on a remote Windows System, if that system’s security 
settings permit this. To obtain a key handie for the remote registry, call 
RegConnectRegi stry (SystemName , Hive). Here Hive is one of the hive constants 
from wi n32con, except for HKEY_CLASSES_ROOT or HKEY_CURRENT_USER. The 
parameter SystemName is a string of the form Wcomputername. 
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Table 37-2 

Common Registry Value Types (from win32con) 

Constant 

Meaning 

REC SZ 

String 

REC DWORD 

A 32-bit integer 

REC BINARY 

Binary data 

REC MULTI SZ 

Array of strings 


Example: setting the Internet Explorer horne page 

Internet Explorer has a horne page, or “start page,” that appears when you start the 
application. Windows Stores the URL of the horne page in the registry. Listing 37-2 
examines, and then tweaks, the horne page URL: 


Listing 37-2: HomePage.py 


import win32api 
import win32con 

SubKey="SOFTWARE\\MierosoftWInternet ExplorerWMain" 
StartPageKey=wi n32api .RegOpenKeyEx(win32con.HKEY_CURRENT_USER, 
SubKey,0,win32con.KEY_ALL_ACCESS) 

(01dURL, Val ueType)=win32api.RegQueryValueEx(StartPageKey, 
"Start Page") 
print OldURL 

NewURL="http://www.google.com" 

win32api.RegSetValueEx(StartPageKey,"Start Page",0, 
win32con.REG_SZ,NewURL) 
wi n32api.RegCloseKey(StartPageKey) 


Creating, deleting, and navigating keys 

The win32api function RegCreateKey (Hi ve , Subkey) creates a subkey in the spec- 
ified hive, and returns a handle to the new key. The function RegDel eteKey (Hi ve, 
SubkeyName) deletes the specified key, and RegDel eteValue( KeyHandl e, Name) 
deletes the specified value from a key. Note that RegDel eteKey cannot delete a key 
that has any subkeys. 
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The function RegEnutnKey ( KeyHandl e, Index) retrieves the names of the subkeys 
of the specified key. It raises an exception (wi n32api . error) if the key has no sub- 
key with the specified Index. For example, this code prints the immediate subkeys 
of the HKEY_LOCAL_MACHINE hive: 

try: 

SubKeyIndex=0 
w h i 1 e 1: 

print wi n32api . RegEnutnKey ( 

win32con.HKEY_L0CAL_MACHINE, SubKeyIndex) 

SubKeyIndex += 1 
except win32api.error: 

pass # (We ran out of subkeys.) 

The function RegEnutnVal ue( KeyHandl e, Index) retrieves values for the specified 
key. Its return value is a tuple of the form (ValueName, Value, ValueType). 

Often programmers keep calling the enumerator functions untii they raise an excep¬ 
tion. However, one can also call RegQuery InfoKey (see “Other registry functions” 
later in this chapter), and iterate over subkeys and values without ever triggering 
exceptions. 

Example: recursive deletion of a key 

Listing 37-3 provides a function to delete a registry key. Uniike RegDel eteKey, it 
can kill off a key with subkeys. 


Listing 37-3: KillKey.py 


import win32api 
import win32con 

def Kill Key (Parent KeyHandl e , KeyNatne): 

KeyHandle = win32api.RegOpenKeyExfParentKeyHandle,KeyNatne, 
win32con.KEY_ALL_ACCESS) 
w h i 1 e 1: 
try: 

# We always retrieve subkey number 0, because 

# when we delete a subkey, the old subkey #1 

# becomes #0: 

SubKeyNatne = wi n32api . RegEnutnKey ( KeyHandl e, 0) 
except: 
break 

Kill Key (KeyHandle .SubKeyNatne) 
print "Deleting", KeyNatne 

wi n32api .RegDel eteKey (Parent KeyHandl e , KeyNatne) 


Continued 
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Listing 37-3 (continued) 


# Create some keys: 

RootKey=win32api.RegOpenKeyEx(win32con.HKEY_LOCAL_MACHINE, 
"SYSTEM",win32con.KEY_ALL_ACCESS) 
win32api .RegCreateKey(RootKey,"Junk") 
wi n32api . RegCreateKey (RootKey ," JunkWStuf f") 
win32api.RegCreateKey(RootKey,"Junk\\Stuff\\Wooble") 
wi n32api . RegCreateKey (RootKey , "JunkWStuf fWWeebl e") 
wi n32api . RegCreateKey (RootKey ," JunkWMore stuff") 

# Delete all the keys: 

Ki 11Key(RootKey,"Junk") 


Other registry functions 

The function RegQuery I nf oKey (KeyHandl e ) returns key metadata, in a tuple of 
the form (SubKeyCount, ValueCount, ModifiedTime). Here SubKeyCount and 
ValueCount are the key’s total subkeys and values, respectively. ModifiedTime, if 
nonzero, is the key’s last modification date, in 100’s of nanoseconds since 1/1/1600. 

Changes made to the registry do not take effect immediately—they take effect 
sometime soon after you close the registry key handle. You can commit registry 
changes immediately with a call to RegFlushKey(KeyHandle). 

You can save a registry key (and all its subkeys) to a file by calling 
RegSa veKey ( KeyHandl e, Fi 1 eName ). Later, you can restore registry settings from 
disk with a call to RegLoadKey (Hi ve , Sub key, Fi 1 eName ). These operations 
require special privileges that you must activate programmatically; see the 
wi n32securi ty API documentation for details. 

Accessing the registry with _winreg 

The Standard library _wi n reg also exposes the Windows registry API. Since the 
underlying API is the same, the functions in _wi n reg are very similar to the registry 
API in win32api. Table 37-3 shows the correspondence: 


Table 37-3 

_winreg and win32api Functions 

winreg Function win32api Function 


CloseKey 

ConnectRegistry 


RegCloseKey 

RegConnectRegistry 
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jwinreg Function 

win32api Function 

CreateKey 

RegCreateKeyEx 

DeleteKey 

RegDeleteKey 

DeleteValue 

RegDeleteValue 

EnumKey 

RegEnumKey 

EnumValue 

RegEnumValue 

FlushKey 

RegFlushKey 

LoadKey 

RegLoadKey 

OpenKey 

RegOpenKeyEx 

QueryInfoKey 

RegQueryInfoKey 

QueryValueEx 

RegQueryValueEx 

SaveKey 

RegSaveKey 

SetValueEx 

RegSetValueEx 


Using msvcrt Coodies 

The msvcrt module, part of the Python distribution on Windows, exposes some 
useful Windows-specific Services from the VC++ runtime library. 

Console 1/0 

You can read a line of input from the user with a call tosys.stdin.readline, and 
you can handle single-character input with Curses, available on most UNIX Systems. 
But what if you want to handle one character at a time on Windows? msvcrt pro¬ 
vides the functions you need. 

The function getch reads one keystroke from the user, and returns the resulting 
character. The call to getch is synchronous: it does not return until the user hits a 
key. For example, this code prints the characters you type until you press Control- 
Break (which is not handled by getch): 

import msvcrt 
while 1: 

print msvcrt.getch() 

Hitting a special key (such as FI) puts two characters on the keystroke buffer. The 
first is an escape character (either chr ( 0 ) or chr (224 )). The two characters, 
together, encode the special key. 
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The function ungetch(char) is the opposite of g e t c h ; it puts a character back 
onto the keystroke buffer. You can only un-get one character at a time. The function 
kbh i t () returns true if any characters are waiting on the keystroke buffer. And the 
function putch(char) writes thespecified character to the console without buffer- 
ing. For example, tbis code writes out some text s-l-o-w-l-y: 

for Char in "Helio there!": 
time.sleep(0.1) 
msvcrt.putch(Char) 

Other functions 

The function setmodeCFileDescriptor, Flag) sets the line-end translation mode 
for the specified file. Here FileDescriptor is the file’s descriptor (as returned by 
os . open), and Flag should be os . 0_TEXT or os . 0_BINARY. 

The function lockingCFileDescriptor, Mode, Bytes) wraps the C runtime 
function _1 ocki ng, enabling you to lock specified bytes of a file. 

You can translate between file handles and file descriptors. Tbe function 
open_osfhandle(File, Flags) produces a file descriptor for the specified file 
handle. The avallable flags to set are os . 0_TEXT, os . 0_APPEND, and os . 0_RD0N LY. 
The function get_osf handl e (FileDescriptor) provides a file handle for the 
specified file descriptor. 

The function heapmi n tidies up the heap, freeing unused blocks for use. It is avail- 
able on Windows NT/2000, but not 95 or 98. 


Summary 

If you’re like me (and I know I am), you use Windows Systems often. So it’s a good 
thing that Python supports Windows programming. In thls chapter, you: 

Tried out the Python Extensions for Windows (win32all). 

4 Tweaked the Windows registry. 

4 Handled single-character input with msvcrt. 

The next chapter moves from the Windows side of the fenee to UNIX. 

4- ♦ 4- 


UNIX- 

Compatible 

Modules 


M ost Python programs you write automatically work 
on any platform that supports Python. Sometimes, 
however, you need to write a platform-specific program hut 
stili want to use Python because of its easier maintenance, 
quicker development time, and so on. 

This chapter shows you the modules that come with Python 
that are specific to UNIX-compatible platforms. Many of the 
functions are nearly identical to similarly named functions in 
C; although I try to give an introductory explanation to all of 
them, some are complex or system-dependent enough that 
you need to spend time reading through their UNIX man 
pages. 

Checking UNIX Passwords 
and Groups 

The pwd module has functions for retrieving entries from the 
UNIX account and password database (usually stored in 
/etc/passwd). getpwnamC natne ) returns the entry for the 
person with the given login natne, and getpwui d ( ui d ) returns 
the same Information but instead you provide the user’s 
unique ID: 

>>> import pwd 

>>> pwd.getpwnamC'dave') 

('dave', 1000, 1000, 'Dave Brueck', 

' /home/dave', 

'/usr/1ocal/bin/tcsh ' ) 

>>> pwd.getpwuid( 1000) 

('dave', '*', 1000, 1000, 'Dave Brueck', 

'/home/dave' , 

'/usr/1ocal/bin/tcsh ' ) 



> ♦ ♦ ♦ 

In This Chapter 

Checking UNIX 
passwords and 
groups 

Accessi ng the system 
logger 

Calli ng shared 
library functions 

Providing identifier 
and keyword 
completion 

Retrieving file system 
and resource 
information 

Controlling file 
deseri ptors 

Handiing terminols 
and pseudo-terminols 

Interfacing with Sun's 
NIS "Yellow Pages" 

♦ > ♦ ♦ 
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Both functions return a seven-tuple of the form 

(natne, password, user ID, group ID, fullname, horne path, Shell) 

The password field is encrypted and contains just an asterisk or ' x ' if the actual 
encrypted password is in the shadow password file (/etc/shadow). 

The getpwal 1 () function returns a list (in random order) of all entries in the user 
database. 

You can use the crypt module to see if a password value is correct for a given user 
(if your program requires that a user “sign in,” for example): 

import crypt, pwd 

def checkPass(usernatne, password): 

'returns 1 if the password is correct' 

try: 

epass = pwd . getpwnatnl username) [ 1] 
except KeyError: 
epass = 'BLAH' 

return epass == crypt.crypt(password, epass) 

For non-GUI programs, the getpass () function in the getpass module is a safe 
way to request that the user input his or her password because it returns the string 
the user enters without echoing the characters to the screen. Most GUI toolkits 
such as wxPython have similar functions for safely requesting passwords. 

The grp module is similar to pwd except that it returns entries from the groups 
database. getgrnatnC name ) returns the entry for the group of the given natne and 
getgrgid(gid) returns the same information except that you supply the group ID: 

>>> import grp 

>>> grp.getgrnamC'operator' ) 

('operator', '*', 5, ['root']) 

>>> grp.getgrgid(5) 

('operator', '*', 5, ['root']) 

The information returned is a four-tuple of the form 

(group name, group password, group ID, list of group members) 

The group password is often blank (or an asterisk), and the member list usually 
doesn't include the group entries from the password database (so you need to 
look in both databases for a complete list of group members). 

The getgral 1 () function returns an unordered list containing all entries in the 
groups database. 
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Accessing the System Logger 

UNIX Systems have a systemwide logging facility for programs to use. Various set- 
tings in sysl og module let you send messages and alter their priorities and 
destinations. 

In the simplest case, you can send a message to the system logging daemon hy call- 
ing syslog([priority], message). The optional pri ori ty can be any of the val- 
ues listed in Table 38-1 (listed from highest to lowest). 

>>> import syslog 

>>> syslog.syslog(syslog.L0G_EMERG, 

'UPS loses power in 2 minutes!') 

After the above call, all users on my FreeBSD machine see: 

Message from syslogd® at Wed Dec 6 02:50:43 2000 ... 
python: UPS loses power in 2 minutes! 


Table 38-1 

syslog Priority Values 


Value Meaning 


L0G_EMERG Panic condition (normally sent to all users) 


L0G_ALERT Condition that needs immediate correction 

L0G_CRIT Critical conditions like hard device errors 

L0G_ERR Errors 

L0G_WARNING Warnings 

L0G_NOTICE Nonerrors that might stili warrant speciai handiing 

L0G_INEO Informational messages (this is the default priority) 

LOG_D EBUG Debugging messages 


The System logger maintains an internal mask of message priorities that it should 
log; it ignores messages with priorities that are not in its mask. setl ogmask(mask) 
sets the internal mask to mask and returns the previous value. L0G_MASK( pri ) cal- 
culates the mask value for the given priority, and L0G_UPT0 (pri ) calculates a log 
mask that Includes priorities from L0G_EMERG down to (and including) the 
priority pri: 

>>> from syslog import * 

>>> setlogmask(L0G_ALERT) # Only L0G_ALERT messages get logged. 

>>> setlogmask(L0G_UPT0(L0G_ALERT)) # AI 1ows EMERG and ALERT. 
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For greater control over message logging, call openlog(ident[, logopt[, 

faci 1 i ty ] ]). i dent is an identifier prefix to include in every message, and 1 ogopt 

is a bit field that chooses one or more options from Table 38-2. 



Table 38-2 

openiog Logging Option Flags 

Flag 

Meaning 

L0G_C0NS 

Messages go to the console if sending to logging daemon faiis. 

LOG_NDELAY 

Connect to the logging daemon immediately (instead of waiting untii 
you log the first message). 

L0G_PERR0R 

Write the message to stderr as well as the system log. 

L0G_PID 

Include the process ID with the log message. 


The faci 1 i ty parameter to openl og is to assign a default facility or classification 
to messages that don’t have a facility due to their priority. Table 38-3 lists the pos- 
sible values. 



Table 38-3 

openiog Facility Values 

Value 

Meaning 

L0G_AUTH 

Authorization system (from 1 ogi n, su, and so forth) 


LOG_AUTHPRIV Same, but logged to a nonworld readable file 


L0G_CR0N 

From the cron daemon 

L0G_DAEM0N 

System daemons 

L0G_FTP 

The ftp daemons 

L0G_KERN 

Kernel-generated messages 

L0G_LPR 

The line printer spooling system 

L0G_MAIL 

MaiI system 

L0G_NEWS 

NetWork news system 

L0G_SYSL0G 

Internal syslog messages 

L0G_USER 

Messages from any user process (this facility is the default) 

L0G_UUCP 

The UUCP system 

L0G_L0CAL0 

Resen/ed for local use (also LOG LOCALI through 7) 
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The c 1 0 se 1 0 g () function closes the log file. 


Calling Shared Library Functions 


The d 1 module lets you dynamically load and call functions that exist in C shared 
libraries. 



As much as possible, avoid using this module. It is inherently platform-specific, 
and makes it much easier to crash your programs. 


Before you can call a shared library function, you have to open the library by call¬ 
ing open(narrie[, mode]). The mode can be RTLD_LAZY (the default) or RTLD_N0W 
to denote late or immediate binding, although some platforms do not provide 
RTLD_N0W (in which case the module won’t even have RTLD_N0W). 


Upon success, open returns a dl object. To see If the object has a specific function, 
call its sym( name) method: 

>>> import dl 

>>> dio = dl .open('/usr/1ib/1ibc. so ' ) 

>>> dio.sym('getpid ' ) 

673070304 

>>> dl 0 .sym('destroyworld ' ) 

>>> 


A dl objecfs cal 1 (name[, argl, args . . . ]) calls a function in the library. You 
can pass in up to 10 arguments; they can be integers, strings, or None for NULL. The 
function you call should return no value or an integer: 

>>> import dl, os 

>>> dio = dl.open('/usr/1ib/1ibc.so ' ) 

>>> dl 0 .cal1('getpid' ) 

3539 # The "bad" way 

>>> os.getpidf) 

3539 # The "good" way 

>>> dl 0 .cal1 (' daemon 1,0) # Make the process a daemon process. 

When you’re finished with a dl object, call its cl ose () method to free its 
resources. On most Systems, however, the memory taken up by the library won’t be 
freed until the main program shuts down. 


Providing Identifier and Keyword Completion 

The readl i ne and rl compl eter modules work together to add useful editing func- 
tionality to Python’s user Input routines (including how the Interpreter works in 
interactive mode). 
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The Python readl i ne module calls the rather large GNU readiine library. This sec- 
tion covers oniy some of its features; for a complete list of the features available 
through the readl i ne module, you shouid visit the readiine section of the GNU 
Web site (www .gnu.org). 

Use the following code to try out tab-completion support: 

>>> import rl cotnpleter 
>>> import readiine 

>>> readli ne.parse_and_bind('tab: complete') 

>>> rea # Now hit the tab key! 

Pressing the tab key completes the impartial identifier. If there exists more than one 
completion possibility, you’ll hear a beep. Pressing tab a second time lists the pos- 
sible completions: 

>>> r # Press tab twice! 

raise raw_input reduce repr rlcompleter 

range readiine reload return round 

With readiine installed, you can use the keys listed in Table 38-4 for cursor naviga- 
tion and editing. 

Note C x means press and hold Ctrl while you press x. M x means the same but with 

the Meta key. On systems that do not have a Meta key, the Esc key works by 
default instead, although you shouid not press and hold Esc. In this case, M x 
means press and release Esc, then press and release x. 



Table 38-4 

readiine Key Bindings 

Key Sequence 

Aetion 

C-b 

Move back one character 

M-b 

Move back one word 

C-f 

Move forward one character 

M-f 

Move forward one word 

C-a 

Move to the start of the line 

C-e 

Move to the end of the line 

DEL 

Delete the character to the left of the cursor 

C-d 

Delete the character under the cursor 

c-_ 

Undo 

C-1 

Ciear the screen, reprinting current line at top 
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In readl i ne terms, cutting and pasting text are killingandyanking, respectively. 
Cutting, or killing, text saves it to a kill-ring from which it can later be “yanked 
back.” Consecutive kills get saved to the same buffer (so that a single yank brings it 
all back at once). Table 38-5 lists the kill and yank keystrokes. 


Table 38-5 

readiine Kill and Yank Key Bindings 

Key Sequence 

Aetion 

C-k 

Kill to end of line 

M-d 

Kill to end of word 

M-DEL 

Kill to start of word 

C-w 

Kill to previous whitespace 

c-y 

Yank most recently killed text 

M-y 

Rotate the kill ring buffer and yank the new top 


You can use M-y only right after you yank text (with C y). It cycles through the kill 
ring buffer, showing you the available text. 

The readl i ne module also lets you save keystrokes as a macro that you can later 
play back as if you had retyped them. C - x ( (left parentheses) starts recording 
keystrokes and C x ) (right parentheses) stops. From then on you can use C x e to 
replay the saved keystrokes. 

The command history Stores each command you type. C - p and C - n cycle through 
the previous and next entries in the history (these functions are often bound to the 
up and down arrow keys too). Call readl i ne’s get_hi story_l ength ( ) function to 
see how many commands the list can hold (a negative value means an unlimited 
number) and set_hi story_l ength (newl en ) to set the maximum history length. 
wri te_hi story_f i 1 e( [f i 1 e] ) writes the history to a file and 
read_hi story_f i 1 e ([fi 1 e]) reads a previously saved file (both use -/.history 
if you don’t supply a file). 

Retrieving File System and 
Resource Information 

The os module contains two functions for retrieving file System Information: 
statvfs(path) returns Information for the file System that contains the given 
p a t h , and fstatvfs(fd) does the same thing except that you provide a file 
descriptor. 
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File System Information 

The statvfs module contains constants for interpreting the tuples returned by the 
statvfs and fstatvfs functions. Table 38-6 describes the different values 
available. 


Table 38-6 

statvfs Identifiers 

Identifier 

Meaning 

E_FILES 

Total number of file nodes 

E_FFREE 

Total number of free file nodes 

E_EAVAIL 

Number of free nodes available to nonsuper users 

E_NAME_MAX 

Maximum file name length 

E_BL0CKS 

Total number of blocks 

E_BEREE 

Total number of free blocks 

E_BAVAIL 

Number of free blocks available to nonsuper users 

E_BSIZE 

Preferred file system block size 

E_ERSIZE 

Fundamental file system block size 

E_ELAG 

System dependent flags 


For example, the following code calculates what percentage of file blocks are not 
in use: 

>>> import os, statvfs 

>>> info = os.statvfs('/tmp') 

>>> print '%.2f %% of blocks are free' % \ 

(infofstatvfs.F_BFREE] * 1.0/ info[statvfs.F_BL0CKS]) 
0.94 % of blocks are free 


Resource usage 

The resource module is useful for tracking resource usage. getrusage(who) 
returns a tuple of values described in Table 38-7. Tbe who parameter can be 
RUSAGE_SELF (to request Information about tbe current process only), 
RUSAGE_CFII LDREN (for information about child processes), or RUSAGE_B0TFI (for 
information about the current process and its children). 
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Tip 


Table 38-7 

getrusage Tuple Values 

Index 

Value 

0 

Time spent executing in user mode 

1 

Time spent in the system executing on behalf of the process(es) 

2 

Maximum resident set size used 

3 

Shared memory used in the text segment 

4 

Unshared memory used in the data segment 

5 

Unshared memory in the stack segment 

6 

Page faults sen/iced without any 1/0 activity 

7 

Page faults sen/iced that required 1/0 activity 

8 

Times the process was swapped out of main memory 

9 

Times the file system had to perform input 

10 

Times the file system had to perform output 

11 

Number of IPC messages sent 

12 

Number of IPC messages received 

13 

Number of signals delivered 

14 

Number of voluntary (early) context switches 

15 

Number of forced context switches 


>>> itnport resource 

>>> resource. getrusage(resource.RUSAGE_SELF) 

(0.077617, 0.181107, 1588, 3300, 2292, 1280, 140, 0, 0, 0, 

0, 0, 0, 0, 50, 3) 

The resource. getpagesi ze( ) function returns the system page size (the 
number of bytes in a memory page). Multiply this value by the number of pages in 
use to get how many bytes of memory a process is using. Note that the system 
page size is not necessarily the same as the underlying hardware's page size. 

Resource limits 

You can also use the resource module to get and set resource limits. Each control- 
lable resource has a soft limit and a hard limit. When a process’s resource usage 
crosses a soft limit, it receives a signal indicating that it has crossed that houndary. 
A process can never exceed a hard limit, however. Attempting to do so usually 
results in the termination of the process. 
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Oniy superusers can alter the hard limits. 


The getrl imi t (resource ) function returns a tuple ( soft, hard ) containing the 
limit values for that resource. setrl i mi t (resource, (soft, hard )) sets new lim¬ 
its for resource (you can use limit values of -1 to specify the maximum allowable 
value). Table 38-8 lists the resource names and their meanings (sizes are in bytes); 
if a particular platform does not support a resource then it will not be in the 
resource module. 


Table 38-8 

Resource Names and Meanings 

Name Maximum Value of 


RLIMIT_AS Address space area 


RLIMIT_CORE Size that a core file can have 

RLIMIT_CPU Number of seconds to be used by each process 

RLIMI T_DATA Size of a process's data segment 

RLIMIT_FSIZE File size 

RLIMIT_MEMLOCK Address space you can lock into memory 
RLIMIT_NOEI LE Number of open files per process 

RLIMIT_NPROC Number of simultaneous processes for this user 

RLIMIT_RSS Resident set size 

RLIMIT_STACK Stack segment size 


RLIMIT_VMEM Mapped memory occupied by the process 


To see the soft and hard limits on the maximum number of open files per process, 
for example, you can use the following code: 

>>> import resource 

>>> resource.getrli mit(resource.RLIMIT_NOFILE) 

(1064L, 1064L) 


Controlling File Descriptors 

The functions in the f cntl module operate on file descriptors, which you can 
access by calling a file or socket objecfs fi 1 eno () method. The options for these 
functions vary by platform; see your system’s man pages for detalls. 
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The fcntl(fd, op[, arg]) and ioctlCfd, op[, arg]) functions perform the 
operation op on the file descriptor fd. If arg is an integer, the functions return inte- 
gers. If the particular operation requires a C structure, you can pass in a string 
object created using struet. pack; in this case the functions return a string repre- 
senting the modified buffer you passed in. 

Tip The FCNTL module defines names for many of the operations you'd pass to 

fcntl. For example, fcntl .fcntl (fi 1 e. fi 1 eno(), FCNTL. F_GETFD) 

^ returns the close-on-exec flag for the given file descriptor. 

The flockffd, op) function performs a locking operation on a file descriptor. This 
operation lets multiple processes cooperatively have simultaneous access to an 
open file (although some other rogue process might stili access the file without 
using locks — see the f 1 ock man pages for details). Valid operations are L0CK_SFI 
(shared lock), L0CK_EX (exclusive lock), L0CK_NB (don’t block when locking), and 
L0CK_UN (release a lock). 


Handiing Terminais and Pseudo-Terminais 

The termi os and TERMIOS modules implement POSlX-style terminal (tty) control. 
termi os defines a few functions to use, and TERMIOS defines “constants” (equiva- 
lent to their C counterparts) that you pass to those functions. 

The tcgetattr(fd) function gets the terminal state referenced by the file descrip¬ 
tor f d and returns it in a list defined as: 

[input flags, output flags, control flags, localflags, 
input speed, output speed, control characters] 

The control characters entry is a list of one-character strings. You can set a tty’s 
attributes using tcsetattr( fd , when , attri butes). attri butes is in the same 
form as returned bytcgetattr, and when telis you when the attribute changes 
should take place. It can be any of the following constants (defined in TERMIOS): 
TCSANOW (make the changes immediately), TCSADRAIN (wait for the System to trans- 
mit to the terminal ali data youVe written to f d and then make the changes), or 
TCSAFLUSFI (same, but also discard any unread input). 

The tcdrain(fd) function waits for the System to transmit to the terminal the out¬ 
put youVe written to fd. tcfl ush (fd , queue) discards queued data on fd. If 
queueisTCIFLUSFI,it discards the input queue data; ifTCOFLUSFI, it flushes the out¬ 
put queue; and if TCIOFLUSFI, it flushes both queues. 

The tcfl ow( fd , acti on ) function suspends or resumes I/O on fd. Actions TCIOFF 
and TCION suspend and resume input, and TCOOFF and TCOON suspend and resume 
output. 
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The tcsendbreakCfd, durati on) function sends a stream of 0 (break) bytes on 
fd. If durati on is 0, it sends the bytes for about half a second; the behavior of 
nonzero values varies by platform (many Systems ignore the value anyway). 

The tty module has two convenience functions for controlling terminals; internally 
they call tcsetattr.ltssetraw(fd[, when]) function changes fd into raw mode 
(the System performs no I/O Processing so I/O data is “raw”). when can be any of 
the same values you pass to tcsetattr (for example, TCSANOW). setcbreak(fd[, 
when]) switches the terminal toacbreak mode. 

The pty module enables you to create and control pseudo-terminals: you create a 
separate process but can read and write to the process’s controlling terminal pro- 
grammatically. This module works on at least Linux; but it hasn’t had as much test- 
ing on other platforms. 

The pty’s spawn(argv) function spawns achild process and connects its control¬ 
ling terminal to the parent process’s Standard 1/0. openpty () creates and returns a 
pseudo-terminal pair of file descriptors in a two-tuple of the form ( tna ste r, 
slave). fork() forks the current process and connects the child’s controlling ter¬ 
minal to a pseudo-terminal. The return value from fork is a two-tuple of the form 
(pi d, f d ). On the parent side, pi d is the child’s process ID and f d is a file descrip¬ 
tor for the pseudo-terminal. In the child process, pi d is 0. 

Note The os module has f orkpty and openpty functions that do the same thing, but 
the pty version is the preferred one because it uses a more platform-independent 
implementation. 


Interfacing with Sun's NIS "Yellow Pages" 

NIS is an RPC-based client/server Service that allows a group of computers to share 
a set of configuration files. It helps System administrators by enabling them to 
update Information in a Central location (the NIS master server) and have that Infor¬ 
mation get propagated automatically to all NIS clients that are part of the same 
group or domain. 

Sun Microsystems originally designed NIS, but implementations are now available 
on just about every UNIX derivative. Python’s n i s module wraps a few of the more 
useful NIS functions, but this module is really useful only if you already know some- 
thing about NIS and have it up and running on your system. 

The NIS master server maintains databases of information called maps] they basi- 
cally map keys to values much like a Python dictionary. The tnaps () function 
returns a list of all map names in the domain, and match (key, tnapnatne ) returns 
the value associated with the given key in the map called tnapnatne. cat(tnapnatne) 
returns a dictionary of key-value mappings for the given map. 


Chapter 58 -f UNIX-Compatible Modules 083 



NIS keys and values are arbitrary strings of bytes and not limited to just normal 
ASCII characters. 


Summary 

This chapter covered the Standard Python modules that work only on UNIX-specific 
platforms. In this chapter you learned to: 

-f Access the UNIX password and group databases. 

-f Write messages to the system-wide logger. 

Control file descriptors and pseudo-terminals. 

Call shared library functions and retrieve System Information. 

This chapter concludes the “Platform-Speciflc Support” part of the book. The 
appendixes that follow cover some of the online resources available and show you 
how to use popular Python development envlronments. 


4 - 



Online 

Resources 



♦ ♦ ♦ ♦ 


T he Internet holds a wealth of information about Python, 
as well as Python programs to do all sorts of things. This 
appendix covers some of the key Internet resources for 
Python. 


Visiting This Book's Web Site 

We, the authors, maintain the Python Bible’s Web site at 
WWW. pythonapocrypha . com. The site includes source code 
printed in this book, extras that were too big to fit in, and 
errata for any problems that (heaven forbid) made their way 
into print. It also includes updated links to other Python stuff. 
We hope that you find it a usefui companion to the book itseif. 


Installing Software 

You can download the Standard Python distribution from the 
Python Language Web site (www. python . org), or directiy 
from SourceForge (http://sourceforge.net/projects/ 
python/). SourceForge is also the place to report bugs in 
Python itseif. (SourceForge is a good place to search for open- 
source Software in general, whether Python-related or not.) 

You may prefer to download ActivePython, the Python distri¬ 
bution by ActiveState. It is available for Linux, Solaris, and 
Windows. ActivePython includes extras such as the Python 
extensions for Windows. Visit www. acti vestate. com/ 
Products/Acti vePython/ to checkit out. 
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PythonWare publishes another Python distribution. It extends the Standard distri- 
bution with the PythonWare Image Library (PIL), PythonWare Sound Toolkit, and 
support for the commercial IDE PythonWorks. 

If you often glue Python to Java, you may prefer JPython, an implementation of 
Python written entirely in Java. Visit www .jpython.org for more information. 

The Vaults of Parnassus (http : //www .vex.net/parnassus/) are a general reposi- 
tory of Python programs, organized by topic. 

The Python Extensions for Windows, also known as wi n32al 1 , are great resources 
if you want to call the Windows API from Python, wi n32al 1 also includes 
PythonWin, a free Windows IDE for Python. Mark Hammond maintains wi n32al 1 at 
starship.python.net/crew/mhammond/. 

If you plan to use Python for Web development, consider downloading Zope 
(WWW .zope.org). Zope has a steep learning curve, but is a powerfui program, com- 
parable in abilities to most commercial application servers. 


Finding Answers to Questions 

The Python FAQ is a good place for general questions — it lives at www. python . 
org/doc/FAQ.html. 

The EAQTs knowledge base includes a large, searchable collection of Python ques¬ 
tions and answers. It covers a much broader range of topics than the main Python 
FAQ. Visit python . faqts . cotn to check it out. 

The main Python Web site includes topic guides — good starting places for tackling 
specialized areas like databases, plotting, and so on (http : //www. python . org/ 
topi cs/). Also available are HQWTQs — detailed guides to very specific topics, like 
configuring your favorite editor for Python (http : / /www .python.org/doc/ 
howto/). 

Also, the archives of the SpeciaI Interest Group (SIG) mailing lists or the Python 
newsgroups (see below) may be a good place to search for specific topics. 
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Subscribing to Newsgroups and Mailing Lists 

Two USENET newsgroups are of interest to Python users: comp . 1 ang . python is an 
open newsgroup for Python-related discussions. It is a fairly high-volume group, 
carrying dozens of new posts each day. The summary group comp . 1 ang . python . 
announce is a moderated, low-volume newsgroup (about a dozen posts a week) 
providing announcements of general interest. It is available as a mailing list—visit 
http://mail.python.org/mailman/listinfo/python-announce-listto 
sign up. 

Archives of old USENET posts are often a good place to search for Information, 
although you’ll have to sift through some noise. One searchable archive of old 
newsgroup postings Uves at http : //groups . googl e. com/. 

Python users have formed several Special Interest Groups to discuss various 
Python topics. For example, you can find an XML processing SIG, an international- 
ization SIG, and a threading SIG. Visit http : //www .python.org/sigs / to sub¬ 
scribe to the SIG mailing lists or view the archives. 

Understanding PEPs: Python 
Enhancement Proposais 

New features for Python are first proposed in PEPs (Python Enhancement 
Proposais). To get an idea of what new features are coming to Python, you can 
browse the list of PEPs Online at http: //python . sourceforge. net/peps/. In par- 
ticular, PEP number 1 is a description of PEPs, and how to go about creating and 
submltting them. 



Python 

Development 

Environments 



♦ ♦ ♦ ♦ 


S everal good editors are available for writing Python pro- 
grams. In addition, you can find some integrated devel¬ 
opment environments (IDEs) for Python that combine an 
editor with a debugger, a class browser, and more. This 
appendix provides an overview of some of the available Soft¬ 
ware, plus a detailed look at IDLE. 


OverView of Python IDEs 

Interactive DeveLopment Environment (IDLE) is a free develop¬ 
ment environment for Python, written in Python. It includes a 
syntax-highlighting editor, a debugger, and a class browser. It 
is part of the Standard Python distribution, and uses Tkinter 
for its user-interface. 

Home page: http : //www. python.org/idle/ 

Pros: Comes with Python; runs on many 

operating Systems 

Cons: No layout designer for GUI programs 

PythonWin is a free Python IDE for Windows. It offers the same 
features of IDLE, with somewhat spiffier packaging. 

PythonWin is part of the Python extensions for Windows 
(win32all), which are included in the ActivePython distribu¬ 
tion. It can integrate with Microsoft Visual Source Safe (VSS). 
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Home page: 

http : / /WWW .acti vestate.com/ 

Products/Acti ve Python/ 
win32al1.html 

Pros: 

Excellent for COM applications; very easy to learn if you 
know Microsoft Visual Studio 

Cons: 

Platform-specific 


PythonWorks is a commercial Python IDE for Windows, Linux, and Solaris. It 
includes a layout editor for graphical development of Tkinter GUIs. It includes a 
deployment tool, which packages projects for distribution. In addition, it integrates 
with the Perforce version control System. 


Home page: 

Pros: 

http : //WWW .pythonware.com/products/works/ 

Easy to create Tkinter layouts; version control integration; 
slick-looking 

Cons: 

The price tag — currently around $400 for an indivldual 
license 


Wing IDE is a commercial Python IDE for Linux. It provides a customizable graphical 
interface for development. 


Home page: 

Pros: 

http://archaeopteryx.cotn/wingide 

Ease of customization — Wing IDE can behave like Emacs or 
more like a Standard Windows application 

Cons: 

Currently platform-specific 


Boa Constructor is a free IDE for building GUI programs using the wxPython toolkit 


Home page: 

Pros: 

http://boa-constructor.sourceforge.net/ 

Fast and precise GUI layout 

Cons: 

Debugger not fully implemented 


BlackAdder is a commercial Python IDE for Linux and Windows. It includes support 
for the Qt windowing toolkit, a library similar to wxWindows. 


Home page: 

Pros: 

http : //WWW .thekompany.com/products/blackadder/ 

Nice Qt support 

Cons: 

Stili in beta; requires a Qt installation 
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Configuring Editors for Python Source 

Note: In the following section, C - X means “Hold down the Control key and press X” 
and M X means “Hold down the Meta (or Alt) key and press X.” The notation may 
string keystrokes together — for example, C X C B means “Type Control-X and then 
Control-B.” (This is the usual notation for Emacs commands.) 

A Python mode for Emacs is available — it makes editing Python code in Emacs 
much easier. Your copy of Emacs may already have Python mode available. If not, 
first visit http : //www. python.org/ em aes/python- mode /to download it. Install 
python-mode.el into the correct directory (probably lisp/progmodes, below the 
main emacs directory). 

Next, for improved speed, byte-compile the file. Within Emacs, type M-X, and then 
byte-compile-file. Then give the full path to python-mode.el. Emacs will create 
python-mode.elc, and spit up some warnings that you can ignore. 

Next, add some lines to the bottom of your .emacs file, to ensure that files with a 
.py extension are opened in Python mode. If you don’t have an .emacs file, create a 
new file named “.emacs” in your horne directory, and paste the following lines into 
it (Emacs exeeutes the Lisp code from the .emacs file when it starts up. You can put 
ali sorts of stuff into the .emacs file, to customize Emacs behavior.): 

(setq auto-mode-ali st 

(cons '("\\.py$" . python-mode) auto-mode-ali st)) 

(setq interpreter-mode-ali st 
(cons '("python" . python-mode) 
interpreter-mode-ali st)) 

Now, open up some Python source code in Emacs. (Type C-X C-F, and then type the 
path to the source file.) The file should show up in color — one color for identifiers, 
another for comments, and so on. If not, you need to turn on syntax highlighting (or 
font-lock, as Emacs calls it). Put the following lines at the bottom of your .emacs file 
to activate global font-lock: 

(cond ((fboundp 'global-font-1ock-mode) 

(global-font-1ock-mode t) 

(setq font-1ock-maximum-decoration t))) 

Start Emacs again, load the file, and enjoy the pretty colors. 

Using Python mode 

You (probably) have two new menus available when you open a Python source file. 
The Python menu enables access to all the Python-mode commands. The IM-Python 
menu lets you jump to any class, function, or method definition (very useful!). If 
you don’t have these menus, you can get them by installing the easymenu.el pack- 
age. Or just install a newer version of Emacs that includes easymenu. 
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From a buffer in Python mode, type C-X m. The online help for Python mode 
appears. Plenty of commands are available; following are some of the most useful 
ones to get you started. 

Type C - c ! to open a Python shell. Type C - c C - c to execute the current buffer. 

You can indent and un-indent a region with C-C > and C-C <. (You can mark a region 
with the mouse, or press C-<space> to start marking a region and move the cursor 
around.) Type C-c # to comment out a region. Python mode doesnT have a key- 
board command to uncomment a region (although it is available in the menu). 
Therefore, you may want to use the “delete rectangle” command. Consider the start 
and end of the current selection as two corners of a rectangle; typing C-X R D will 
delete that rectangle. 

Pythonizing other editors 

Python syntax-highlighting is available for Vim (VI iMproved). See 

http: //WWW. vi tn. org/syntax/python . vi m for one specification file, fn addition, if 

you compile Vim with the +python feature, you can execute Python statements 

from within Vim. See http : //www .vitn.org/html/i f_python . html for an 

explanation. 

If you have another favorite source-code editor, you may be able to make it 
“Python-aware” with proper indentatlon rules, syntax highlighting, and so forth. 

The editor HOWTO (http : //www .python.org/doc/howto/editor/ 
edi tor. html) offers some useful pointers. 


Editing with IDLE 

I use IDLE for much of my Python development, and IVe been quite happy with it. 
This tutorial will get you up and running with most of IDLE’s features. If you like, fol- 
low along in IDLE as you read to get a feel for the available editor commands. (I 
know that I always need to try out new commands, to teach them to my fingers.) 

Exploring the IDLE Python shell 

The first window IDLE opens is a Python shell. Here, you can explore Python com¬ 
mands interactively, just as if you had run Python from the command line. IDLE also 
provides some shortcuts to make your work easier. 
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For example, suppose I am writing code to retrieve Web pages, and I decide to try 
out some functions from uri 1 i b. First I press Alt-F2, to make the IDLE window 
expand (vertically) to fili the screen. Next, from within the shell, I import the 
uri 1 i b module. To remind myself what the function urllib.urlopen does, I print 
its docstrlng — but make a typo. Oops! IDLE won’t force me to retype the command, 
though. To repeat the last command, I press Alt-P. Pressing Alt-P repeatedly cycles 
through older commands; pressing Alt-N cycles through newer commands (useful If 
you press Alt-P too many times!). Next, to scroll back to the typo quickly, I can 
press Ctrl-Left-Arrow to move the cursor back, one word at a time. 

Next, I start to call the function. Once I type the open-paren, IDLE pops up balloon 
help to Show the function signature and docstring. Eigure B-1 shows my current 
situation. 



Figure B-1: IDLE with function signature displayed 


At this polnt, I remember there was another function in uri 1 i b, one that grabbed a 
Web page to disk in one line. What was it called? Something startlng with uri... I’m 
feeling too lazy to look it up in the documentatlon. I could always type print dir 

(urllib) (or urllib._dict_.keys 0) to jog my memory, but Instead I type urllib.url 

and press Alt-/. The Alt-/ command completes typing half-finished names — when I 
press it more than once, it cycles through each possibility. In this case, it takes me 
to uri 1 i b . uri 1 i b, uri 1 i b . uri open, urllib. uri retri eve — ah, yes, thafs the 
function I want! 

By the way, if IDLE ever finds itself without a Python shell open, you can summon a 
new one by choosing Python Shell from the Pile menu. 
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Navigating source code 

You can cruise around in the source code with the arrow keys (or the mouse), Page 
Up, and Page Down. You can move to the start of the line with Home (or Ctrl-A), or 
the end of the line with End (or Ctrl-E). Ctrl-Home and Ctrl-End take you to the top 
and bottom of the file, respectively. To jump to a Une, press Alt-G and type the Une 
number. 

Ctrl-Left and Ctrl-Right move around the file one word at a time. Ctrl-Up and Ctrl- 
Down move up and down one paragraph at a time. In addition, you can hold down 
the Shift key while moving around to select a block of source. 

Block commands 

Once you select a block of text, you can copy (Ctrl-C), cut (Ctrl-X), and later paste it 
(Ctrl-V). 

Select a block of code and press Alt-3 to comment it out; press Alt-4 to uncomment 
it again. Note that comment Unes do not count as un-indented Unes for purposes of 
control block structure. 

You can indent and un-indent (outdent) a block of code with Ctrl-] and Ctrl-[, 
respectively. You can also tabify and untabify a block with Alt-5 and Alt-6, respec¬ 
tively. I prefer to untabify (convert tabs to spaces) code mercilessly, and turn Tab 
mode off with Alt-T, because different editors treat tabs differently. 

Searching and replacing 

Press Ctrl-F to search for text in a file, and E3 to repeat a search. Press Ctrl-H to 
search and replace. Alt-F3 lets you search for text in files (such as running the UNIX 
utility grep). The output goes to its own window. In that wlndow, right-click a Une, 
and choose “Go to flle/line” to jump to the file from which the Une came. 

More IDLE shortcuts 

IDLE’s class browser lets you jump to a class or function definition with minimal 
legwork. Press Alt-C to bring it up, poke around in the tree-browser of the current 
module’s members, and double-click on an entry to jump to that Une of code. 

(You’U probably want to keep the class browser’s window handy, as presslng Alt-C 
repeatedly can leave numerous orphaned class browsers lying around.) 
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The Help menu includes a very useful link to the local Python documentation. 

The path browser enables you to easily browse all the directories in your Python 
path. 1 don’t use it often, but if (for example) you ever find yourself importing the 
wrong copy of a module or .pyd file, the path browser can show you where the 
bogus one is coming from. 


Debugging with IDLE 

Suppose you want to test some code. First do a quick save (Ctrl-S), and then press 
F5 to run the program within IDLE. Listing B-1 illustrates some buggy code, for 
practice: 


Listing B-1: Buggy.py 


import os 

def FindSourceFi1 es(Directory,Resuits = []): # Bug 2 
for FileName in os.1 istdir(Directory): 

Extension=os.path.splitextension(Fi 1eName) # Bug 1 
if Extension==".py": 

Results.append(FileNarrie) 
return Results 

print FindSourceFiles(os.curdir) 

Path=os.path.join(os.curdir,"Li b") 
print FindSourceFi1 es(Path) 


When 1 run the program, Python quickly complains (and rightly so!) that there is no 
such thing as os . spl i textensi on. 1 bring up lDLE’s stack viewer by choosing 
Stack Viewer from the Debug window. (Actually, I cheated — I checked the Auto¬ 
Open Stack Viewer button in the Debug window, to save myself some time.) Note 
that the Debug window is available on the Python shell window, and not on source 
listing Windows. Prom the stack viewer, 1 can jump to a source-code line by double- 
clicking it. 1 can right-click a stack-trace line in the shell, and choose Go to Pile/line. 
Figure B-2 shows IDLE’s stack viewer. 
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Figure B-2: Examining a call stack in IDLE 


I replace os . spl i textensi on with os . spl i text. One bug squashed [cue victory 
chord]. But there’s another bug in this code — it runs (as long as the current direc- 
tory has a subdirectory named lib), but it doesn’t give me what I want. I have three 
files in the current directory, and another file in the Lib subdirectory. My program’s 
second list of source files seems to include all the entries from the first list, as seen 
in Listing B-2: 


Listing B-2: Sample Buggy Output 


['Buggy.py', ’LessBuggy.py’, ' NotBuggy.py’] 
['Buggy.py’, ’ LessBuggy.py’, ' NotBuggy.py’, 
' FancyPrime Finder.py'] 


How did those extra file names get in there? This looks like a job for the IDLE 
debugger. From the Debug menu, 1 choose Debugger, to open the debugger window. 
This time, when 1 press F5 to run my script, execution pauses, and I can step 
through it more carefully. (See Figure B-3.) The Step button executes the current 
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source line, stepping into any Python function calls. The Over button executes the 
source line without stepping into subfunctions. The Out button keeps executing 
until the current stack frame finishes. The Go button keeps executing until the pro- 
gram finishes (or crashes), and the Quit button stops the program. 



Figure B-3: Interactive debugging with IDLE 


In this case, I notice that in my second function call. Resui ts, is full of data right 
from the start. How did this happen? Ah, yes — the local variable Resui ts is stili a 
reference to the same old list, and the list stili has data in it! I print out 
i d ( Re s u 1 1 s ) within the function. The same object ID each time — the villainous 
bug is exposed, as seen in Listings B-3 and B-4: 


Listing B-3: LessBuggy.py 


import os 

def FindSourceFi1 es(Directory,Resuits = []): # Bug 2 
print id(Results) 

for FileName in os.1 istdir(Directory): 

(Natne , Extensi on )=os . path . spl i text ( Fi 1 eNatne) 
if Extension==".py": 

Results.append(FileNatne) 
return Results 

print FindSourceFiles(os.curdir) 

Path=os.path.join(os.curdir,"Li b") 
print FindSourceFi1 es(Path) 
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Listing B-4: Less buggy output 

9747060 
['Buggy.py' , 

'NotBuggy.py' , 

'LessBuggy.py'] 

9747060 
['Buggy.py’ , 

'NotBuggy.py ’ , 

'LessBuggy.py’ , 

’FancyPrime Finder.py'] 



IVe learned my lesson — be careful when passing a list (or any other mutable 
object) as a default parameter value! A safer alternative is to make None the default 
value, and set Results to an empty list within the function: 


def FindSourceFiles(Directory,Results=None): 
if (Resuits==None): 

Resuits=[] 



You can break the currently executing program with CtrI-C ... usually. Sometimes, 
there is no way to stop a program running under IDLE without stopping IDLE. Be 
sure to save your work in every window before using IDLE to debug! 


Editing with PythonWin 

The Python shell in PythonWin behaves much like it does in IDLE. Use Ctrl-Up and 
Ctrl-Down to cycle through old commands. Use Ctrl-Space to prompt PythonWin to 
suggest completions for names. In addition, PythonWin provides a list of available 
members when you type an objecfs name; use the arrow keys (or the mouse) to 
scroll through the possibilities, and then press Tab (or double-click a member 
name) to insert the name. 

To toggle between source code and the Python shell, press Alt-I. You can also cycle 
through Windows with Ctrl-F6 and Shift-Ctrl-F6. 

Editing source in PythonWin 

PythonWin can collapse blocks of code into a single line. This is a nice way to focus 
on the code you’re interested in. Use the + and - from the numeric keypad to 
expand and collapse a block; use the * from the numeric keypad to expand and col¬ 
lapse the whole file at once. A block’s status is indicated to the left of the source 
line with a + or you can also click these to open and close the block. I recommend 
turning Num-Lock off, and using the keypad arrows to scroll — this keeps your hand 
right next to the “tree-keys.” 
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To go to a specific line number, press Ctrl-G and then type the line number. 

To comment block in PythonWin, press Alt-3; to uncomment, press either Shift-Alt-3 
or Alt-4. Block indent and un-indent are simply Tab and Shift-Tab. 

Debugging with PythonWin 

To get some practice using PythonWin’s debugger, let’s fix some buggy code. Listing 
B-5 is an example of some code with bugs: 


Listing B-5: Buggy.py 


import os 
import string 
import random 

LegalChars=string.letters+string.digits 

# Create a temp file 
LetterIndex=0 
while LetterIndex<20: 

Fi 1eName=Fi1eName+random.choice(LegalChars) 

Fi 1e=open(Fi 1eName) 

Fi 1 e.write("Test") 

Fi 1 e.close 

os.remove(Fi 1eName) 


To run this code in PythonWin, 1 press F5. PythonWin complains about the missing 
variable name FileName. If the source window is maximized, the Python shell 
(where the stack trace is displayed) won’t be visible; press Alt-1 to jump to it. 

1 double-click on the error, in the shell window, to jump to the corresponding 
source code. (This is very useful when debugging a project with many files.) Then 1 
add code to initialize FileName to above the while loop, and press F5 to run 
agaln. 

1 notlce that my program is taking its own sweet time to execute. It looks like 1 may 
have an infinite loop. To stop executing the program, 1 look to my system tray (in 
my taskbar), right-click the PythonWin icon, and choose Break into Running Code. 
A quick glance at my code shows that the while statement will never finish, 
because Letterindex will never be incremented. 
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Now the code runs, but the line os . retnove( Fi 1 eNatne ) raises an lOError with the 
message “Permission denied.” It seems there is another bug in the code. (YouVe 
probably spotted it by now, but bear with me.) 

To prepare for my debugging session, I set a breakpoint on the line that tries to 
remove the file. To set a breakpoint, press F9, or click the breakpoint hand icon on 
the toolbar. Next, I press F5 to run. When execution stops, I go to the Watch window 
(if it’s not showing, I click the glasses icon on the debugging toolbar to bring it up). 

I watch the expression FileName, and I watch the expression File. (See Figure B-4.) 



Figure B-4: Debugging within PythonWin 


The line highlighted in Figure B-4 ought to delete the file. Aha! My file is stili open — 
no wonder I can’t delete it! The statement Fi 1 e. cl ose is simply a reference to the 
cl ose method of my file. I need to call File.closef). 

Following are some other keys to keep in mind when debugging in PythonWin: 

F5 Continue running 

FI 1 Execute the next statement, stepping into any subfunctions 

FIO Execute the next statement, without stepping into subfunctions 

Shift-Fl 1 Einish executing the current stack frame 

> > -f 
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Symbois & Numbers 

- character, 23 

in string formatting, 41 
! character, 141 
!= characters, 30 

# characters, 5 
$ character, 141 
% character, 7, 23 

in division operations, 7 
in string formatting, 40-41 
0 characters 

in calculations, 4 

in complex expressions, 32-33 

in regular expressions, 141 

* character, 23 

with fnmatchQ function, 164 
in regular expressions, 140 
repeating strings using, 37-38 
in string width fields, 42 
** characters, 23 
character 

in regular expressions, 140 
in string formatting, 42 
/ character, 23 
? character, 141 
[] characters, 30 

in regular expressions, 140 
\ character 

in regular expressions, 141-143 
in string literals, 35 
'' character, 23 

in regular expressions, 141 
_ character, 20 
{} characters, 141 
I character, 23, 32 
~ character, 23 
+ character 

as arithmetic operator, 23 
concatenating strings using, 37 
overloading, 108 
in regular expressions, 141 
in string formatting, 41 


< character, 30 
<< characters, 24 
<= characters, 30 
== characters, 30 
> character, 30 
>= characters, 30 

A 

aborto function, 184 

absO function, 24 

_abs_0 method, 112 

absolute paths, 155 

absolute value, calculating, 24 

abspathO function, 162 

abstract object layer (Python/C API), 556 

Abstract Syntax Trees (ASTs), 613-614 

AbstractFormatter, 306 

accelerators (wxPython), 411-412 

accepto method, 252 

accessO function, 156 

acquireO method, locking using, 485-486 

ActiveX Controls, embedding in xwPython, 414 

_add_0 method, 112 

addheaderO method, 313 

adding attributes, 101-102 

addition operator (+), 23 

addresses, e-mail, 310-311 

AddressList class, 310 

addstrO method (curses module), 416-417 

adler320 function, 211 

afterO function, 368 

afterO method, with Tkinter, 386 

aifc module, 456 

AlFF sound files 

handling chunked data, 460-461 
reading/writing, 456-461 
reversing (reverseSound.py), 459-460 
alarmO function, 192 
alias command (pdb module), 500 
alignment modifiers (struet module), 206 
allocateJockO method, 485-486 
alpha channels, in graphies files, 467 
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Alpha.py (module-tester), 93 
_ancl_0 method, 114 
AND operator (&), 23, 31-32 
anonymous (lambda) functions, defining, 90-91 
anydbm dictionary, 229-230 
Apache server CGI Scripts, 299 
appearance options (Tkinter widgets), 354 
appendo method, 58 
with arrays, 71 
with IMAP mailboxes, 288 
with wxPython, 411-412 
applyO function, 90 
arbitrary precision numbers, 587-588 
archives utility, 657 
arguments 

in exceptions, 81-82 
in functions/tuples, 89-90 
arithmetic operators, 4 
date arithmetic, 220 
joins using, 52-53 
arrays (array objects), 68-71 
array elements, 600 
attribute options, 600-601 
audio-editing program (quiet.py), 594-595 
converting to lists/strings, 592-593 
managing using Numeric Python, 597-600 
matrix arithmetic using (MovingAverage.py), 
603-604 

articleO method, 294 
articles, in newsgroups, 293-295 
ASCII values 

encoding binary data as, 317-319 
encoding in URLs, 276-277 
asctimeO function, 222 
asserto function, 83 
assertions, 83-84 
assignment statements, 26-28 
using with lists, 57-58 
AST objects, 613-614 
asterisk character (*) 

with fnmatchO function, 164 
in regular expressions, 140 
in string width fields, 42 
asjmchronous HTML page retriever 
(asyncget.py), 272-273 


asynchronous signals, handling, functions 
for, 191-193 

asyncore module, 271-273 
atexit module, 184 
atofO/atoiO/atolO functions, 139 
attributes. See also values 

in classes, managing, 101-102 
of functions, built in, 609-611 
of markup language tags, 326 
attronO/attroffO methods (curses), 417 
attrsetO method (curses), 417 
AU sound files 

reading/writing, 458 
reversing (ReverseSound.py), 459-460 
audio files. See sound files 
audio streams, editing (Quiet.py), 594-595 
audioop module, handling audio fragments, 
461-464 

auditing tables, 238-240 
augmented assignment statements, 38 
authentication, 278 

avgO function (audioop module), 461-462 

B 

b amnt statement (Lepto), 435 

backgrounds, terminal displays, creating, 418-419 

backslash character (\) 

in regular expressions, 141-143 
in string literals, 35 
bad.py_ (security test code), 519 
base classes 

extending, 104-106 
overloading, 109 

Base64 encoding (e-mail), 318-319 
BaseHTTPRequestHandler, 264-265 
BaseHTTPServer module, 264 
Bastion module/object, 520-521 
BeginDrawingO method, 409 
behavior options (Tkinter widgets), 354 
bkgdO/bkgdsetO methods (curses module), 418 
bidirectionalO function, 153 
binary data 

encoding as ASCII, 317-319 
reading/writing (struet module), 207 
storing, 195 

binary distributions/installers, 653-654 
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binary mode data storage, 195 
binary operations, 23 
bindO method, 251 
with widgets, 372 

bisect module, with sorted lists, 60 
Bitmapimage class (Tkinter module), using with 
Python Imaging Library, 475-476 
bitwise operators, 114-115 
BlackAdder development program, 690 
blocking, by sockets, 253-254 
BloodType.py, 342 

BloodTypeSax.py (Sax module), 335-336 

Boa Constructor development program, 690 

bodyO method, overriding in Tkinter dialogs, 381 

BoldOnly.py, 329-330 

Boolean operators, 31-32 

borderO method (curses module), 418 

bottienecks, locating, 505-509 

box sizers (wxPython), 403-405 

boxes/borders (curses module), 418 

brackets ([]), 30 

in regular expressions, 140 
break statements, 8 

with looping statements, 77-79 
breakfast buttons (FoodChoice.py), 352-354 
breakpoints, setting (pdb module), 499 
browsing, newsgroups, 293. See also Web browsers 
BSD data objects, 233-234 
bsddb module, 233-234 
buffer interface, 566-567 
bufferJnfoO method, 71 
Buggy.py (error-filled code), 695 
built-in data types/sequences, 49 
built-in functions 

attributes (table), 609-611 
globais, 96 
locais, 96 
openO, 122 
in Python/C API, 571 
buttonboxQ method, 381 
byte orders, 196 
converting, 71 
byteswapO method, 71 

c 

C/C++ code. See also Python/C API 

converting from Python code, 538-541 
converting Python data to, 532-538 


dictionary functions, 570-571 
embedding Python in, 541-543 
file objects, 571-572 
general object functions, 559 
handling empty values, 571 
handling Unicode strings, 567-569 
list functions, 564-565 
mapping functions, 569-570 
module objects, 572-574 
number functions, 559 
Python extension modules, 527-531 
reference counting, 513 
running Python code, 543-546 
sequence functions, 562 
tuple functions, 565-566 
type object function, 571 
C locale. See localization 
C Socket library, 248 

C structures, converting to/from, 204-207 
calculations. See operators 
calendarO function, 225 
calendar module, 224 
_call_0 method, 109-110 
call name statement (Lepto), 435 
callability, testing for, 110 
call-by-values, 88 

can_change_colorO function (curses module), 428 
canvas widgets (Tkinter module), 366-367. 

See also widgets 

capitalization, methods for, 134-135 
capwordsO function, 139 
cards.py (random number generator), 585 
caret Symbol (''), 23 

in regular expressions, 141 
case-sensitivity, 5 
of identifiers, 19 
categoryO function, 153 
centerO method, 134 
egi module, 298-302 
CGI Scripts 

Python support for, 267-269 
writing/managing, 298-302 
CGIDebug.py, 301-302 
CGIHTTPRequestHandler class, 267 
CGIHTTPServer module, 264 
channels (sound files), 453 
character categories (string module), 138-139 
character data type, 40 
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character groups, in regular expressioris, 143 

character sets, 150 

characters 

accessing in strings, 38-40 
reading individual characters, 121 
speciai, in terminal displays, 422-423 
chdirO function, 165 
checkO method GMAP4), 287 
checksum, computing, 211 
child classes, 14, 102-104 
child processes, running, 181-183 
chmodO function, 157 

choosecolor.py (color system conversions), 
471-472 
chrO function, 45 

chunked data, reading/handling in sound files, 
460-461 

circular references, 65 
class data type, 107 

classes, class objects. See also specific classes and 
objects 

accessing members of, 15-16 
base, extending, 104-106 
base, overloading methods, 109 
browsing, 609 
child classes, 102-103 
class data type, 107 
class statements, 100 
class variables, 100 
creating, 100-101 
customizing/extending, 104-106 
defining, 15, 100, 107 
as exceptions, 82 
hiding data in, 106-107 
instance objects, 101 
managing attributes in, 101-102 
parent/child classes, 14 
protecting, 520-521 
retrieving string name, 108 
speciai members, 101 
variables, 100 
ClassType data type, 68 
clearcacheQ function, 174-175 
clipboard, with wxPython, 413 
clockO function, 220-221 
clockgif.py (PIL Draw object), 478-479 


closeO, 123 

with child processes, 181 
with file descriptors, 173 
with GzipFile, 215 
with mmap objects, 176 
with shelve object, 203 
ClosestPoint.py, 77-78 
closing. See also exiting 
file objects, 123 
processes, 183-185 
sockets, 251 
Cmd class, 440 
cmd module, 433, 440-445 
cmdloopO method (cmd module) 
cmpO function, 109 

with file comparisons, 171-172 
with string comparisons, 43 
_cmp_0 method, 109-111 
CObjects, 574 
code, debugging, 497-501 
code testing tools, 502-505 
error tracebacks, 605-608 
exceptions, 81-83 

Interactive DeveLopment Environment for, 
695-698 

locating bottlenecks, 505-509 
pdb for, 497-501 
code, executing 
assertions, 83-84 
exec statement, 97 
flow controi (if-statements), 73-74 
for-statements, 74-75 
Game of Life example, 84-86 
looping statements, 74-79 
performance statistics for, 507-508 
reference counting, 512-513 
running from C, 543-546 
self-examining code (introspection), 608-611 
while-statements, 79 
code, imported, setting aside, 14 
code, Python 

browsing classes/functions, 609 
browsing functions, 609-611 
checking indentation, 611 
converting to C, 531-532 
disassembling, 615-616 
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editing tools, 692-699 
tokenizing, 611-613 
codec module, 151-152 
coerce functionO, 25 
_coerce_ method, 115 
coercing nurabers, 24-26, 115 
color name statement (Lepto), 435 
color options 

curses module, 427-428 
Tkinket widgets, 354, 365 
color pairs (curses module), 427-428 
color palettes, 467 

color_pair0 function (curses module), 428 
color scheme customizer (ColorChooser.py), 
377-381 

color System conversions, 469-472 
ColorChooser.py, 377-381 
colorsys module, 470-471 
column types, in databases, 240-241 
combiningO function, with Unicode strings, 153 
command-line interpreter, creating, 440-442 
command-line parameters, viewing, 166 
command prompt, 4 
running programs, 6 

Common Gateway Interface. See CGl Scripts 
commonprefixQ function, 164 
Communications, multicasting (multitest.py), 
257-261. See also e-maii; Internet 
Communications 
comparing 

comparison functions, 30-31 
comparison operators, 29-30 
files, 171-172 
identity references, 63-64 
rich comparison methods, 110-111 
sequence data types, 53 
strings, 42-43 
compileO function, 97 
compiling 
modules, 95 

regular expressions, 144-146 
Complaint.py, 382 
complex expressions, 32-33 
complexO function, 45 
_complex_0 method, 115 
complex numbers 
combining, 24 

in math module functions, 583 


components. See widgets 
compound expressions, 31-32 
compressO function, 211 
compressing data, 196 
graphics files, 467 
gzip module, 213-214 
PyZipFile class, 216 
zipfile module, 214-215 
Ziplnfo class, 215-216 
zlip module, 211-213 
concatenating 

data types, 52-53 
strings, 37 

Conceal.py (file-hiding program), 318-319 
concurrency control (thread module/threading 
module), 485-488 

Condition class (concurrency control), 488 
conditional statements, 7-8 
ConfigParser module/object, 188-190 
configuration files, managing, 188-190 
connecto method, 234, 251 
connectionO method, 289 
connection objects, in databases, 234 
constraints, layout (wxPyton), 406-407 
constructors, 15 
containers, pickling, 198 
_contains_0 method, 113 
ContentHandler object, 334-335 
contiguous arrays, identifying, 592 
continue statements, with loooping statements, 
77-79 

control blocks, 6 

control flow using if statements, 73-74 
Controls (wxPython module), 399-401. 

See also widgets 

converto method, with graphics, 474-475 
cooked mode (curses module), 421 
cookie dictionary, creating (httpreq.py), 149-150 
cookies (cookie module) 

importer for (CookieMonster.py), 323-324 
managing/storing, 322-324 
coordinates, in wxPython, 402-403 
copyO, 63 

copying objects, 65-67 
with 1MAP4 objects, 287 
path management, 168 
in Python Imaging Library, 474 
copy module, 66-67 
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copyfileO function, 168 
copyfileobjO functions, 168 
copying, graphics/images, 474 
copymodeO function, 168 
copytreeO function, 168 
countO method, 136 
with arrays, 71 
with lists, 58-59 

counting references, object, 64-65 

cPickle module, 198 

crc320 function, 211 

createO method, for mailboxes, 287 

cropO method, resizing images using, 476 

CSV.py (testing example), 502-503 

curly hrackets ({)), in regular expressions, 141 

curselectionQ method, 375 

curses module, 121 

color options, 427-428 
cursor options, 420-421 
handling terminal displays, 415-416 
managing text, 416-417 
maze game (maze.py), 429-432 
starting up/shutting down, 416 
text editing options, 426-427 
User input options, 421-425 
window/screen displays, 417-420 
Windows management, 425-426 
CurseWorld.py, 416 
cursors 

on curse-based terminal displays, 420-421 
in databases, 235 
Tkinter module options, 385-387 
with wxPython, 413 

customized exceptions (Pyton/C API), 578 
CXX, SCXX (Simplified CXX), 549-550 

D 

data 

audio. 460-464 
graphics, 472 
hiding, 106-107 
data storage 

byte order (endianness), 196 
compressing data, 196, 210-216 
destination issues, 196 
end User issues, 196 
object state, 196 


saving objects to disk (pickling), 197-200 
text versus binary mode, 196 
inXML, 195 
data types 

adding pickling support, 199 
built-in, 67-68 
class, 107 

dictionaries for (win32all), 661-662 
instance, 107 

packing/unpacking, 208-210 
printing listing of, 67-68 
sequence, 49 
in win32all, 661-662 
data types, numeric 
combining, 24 

comparison functions, 30-31 
comparison operators, 29-30 
converting from string data type, 44^5 
converting to string data type, 45-47 
floating point numbers, 22 
functions for, 24-26 
imaginary numbers, 22 
integers, 21 
long integers, 21-22 
using operators with, 23-24 
data types, string 

accessing characters/substrings, 38-40 

character data type, 40 

converting from numeric data type, 45-47 

converting to numeric data types, 44-45 

formatting, 40-42 

length, 35 

string comparisons, 42-43 
string literals, 35 
databases, relational 

accessing, concurrency issues, 485 
auditing tables, 238-240 
column types, 240-241 
connection objects, 234 
cursor objects, 235 

database libraries, viewing Information 
about, 242 

dbm objects, 229-231 
error hierarchies/exceptions, 243-244 
input/output sizes, 241 
metadata, 237-240 
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saving objects into, 203-204 
SQL statement parsers, 242 
transactions, 234-235 
viewing Information about, 242 
DatagramRequestHandler, 264 
date arithraetic, 219-220 
dates 

formatting, 222-223 
handling in wxPython, 413 
searching for, 225 

dayiight savings time, handling, 226 
DB API. See Python Database API 
dbhash module, 229, 233 
dbm module/dbm objects, 229-232 
deadlock, preventing, 488-489 
deathray.py (curses module), 424-425 
debugging. See also error handling 
destructors, 500-501 

Interactive DeveLopment Environment for, 
695-698 

Python code, 497-501 
decimalQ function, 153 
decodeO function, 312 

uuencode algorithm, 317-318 
DecoderRing.py, 75-76 
decompositionO function, 153 
decompressQ function, 211 
deepcopyO function, 66-67 
deep copying, 66 
def FunctionName statements, 6 
def statements, 87-88 
defaultsQ method, 189 
defining 

exceptions, 82 
functions, 5-6, 87-91 
new classes, 15 
delQ method/function 
with dictionaries, 61 
limitations of, 512 
with list items/slices, 58 
with object references, 65 
_del_0 method, 109-110 
with widget listboxes, 375 
deleting. See also removing 
file contents, 124-125 
list items or slices, 58 
_delitem_0/_delslice_0 methods, 113 


derived classes. See child classes 
destructors, finding errors in, 500-501 
development tools 
BlackAdder, 690 
Boa Constructor, 690 
Emacs editing tools, 691-692 
Interactive DeveLopment Environment (IDLE), 
689-690, 692-695 
PythonWorks, 690 
WingIDE, 690 

device context classes (wxPython), 408-411 
dialog/message boxes (Tkinter module), 361 
customizing, 381-382 
text editor example, 362-365 
dialogs, built-in (xwPython), 407-408 
dictionaries, 10-11 
accessing, 61 
adding to/replacing, 61 
disk-based, 229-231 
environ, 165 

formatting strings using, 41 
namespaces, 95, 97 
pickling, 198 
updating, 62 

dictionary objects (Python/C API), 570-571 
dictionary operators, overloading, 112-113 
digests, message fingerprints, 521-523 
digito function, 153 

dirO function, viewing module contents, 92 
dircmp class, 171-172 
directories (os/os.path modules) 
changing, 165-166 
creating, 169-170 
functions for, 163-164 
viewing working directory, 165 
disO function, 616 
dis module, 615-616 
disassembling Python code, 615-616 
disk-based dictionaries, 229-231 
dispatcher class, 271-273 
displays, terminal, handling (curses module), 
415-432 

distributing applications 
binary distributions, 653 
controlling files in, 648-649 
customizing setups, 650 
non-Python files in, 648-650 


Continued 
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distributing applications (continued) 
package distributions, 647-648 
simple distributions, 643-647 
source distributions, 648-653 
standalone executables, 655-657 
disutils module 

distributing extension modules, 650-651 
package distributions, 647-648 
simple distributions (timeutil.py/setup.py), 
643-647 

source/binary distributions, 651-653 
_div_0 methods, 112 
division calculations 

modulo operator (%) in, 7 
function for, 23, 25 
division operator (/), 23 
divmodO function, 25 
divmodQ method, 112 
dl module, with C shared libraries, 675 
DNS (Domain Name System), 248 
docstrings, 87, 501 
doctest module, 502 

Document Object Model API. See DOM API 
Document Type Descriptors (XML format), 326 
documentation, creating and maintaining, 501 
eo_EOF0 method (cmd module), 441 
do_helpO method (cmd module), 441 
do_shellO method (cmd mdoule), 441 
dollar sign ($), in expressions, 141 
DOM (Document Object Model) API, 338 
data exchange using (XMLDB.py), 340-341 
DOM nodes, 338-339 
elements, attributes, text, 338-339 
parsing XML files, 327, 338 
Domain Name Servers (DNS), 248 
domain names, 248 

host name, address functions, 248-250 
dotted notation, accessing packaged modules, 96 
downloading files using FTP, 290-291 
drag-and-drop operations 
Tkinter support, 382-385 
with wxPython, 413 
drawing. See also graphics/image files 
boxes, in curses module, 418 
Draw objects, 477-479 
Tkinter module widgets, 366-367 
xwPython device contexts, 409-411 


drawing canvas, creating (Fvents.py), 373-374 
DTDHandler class (XMLReader), 337 
DTDs (Document Type Descriptors), 326. 

See also XML format 
dumbdbm module, 229-230 
dumpimp.py (dummy Importer), 637 
dumpsO function, 197 
dupO function, 173 
dup20 function, 173 

dynamic extension module linking, 531-532 

E 

e constant, in math module, 581 
e-mail 

encoding/decoding, 317-319 
IMAP protocoi for, 285-288 
parsing messages, 309-310 
POP3 protocoi for, 281-283 
SMTP protocoi for, 283-285 
viewing/storing addresses, 310-311 
echo/no echo functions (curses module), 422 
editing text 

curses module options, 426-427 
wxPython Controls, 401 
elements, in XML, 326 
else-blocks (elif-blocks) 
with except clauses, 81 
with if statements, 73-74 
else-clauses, else-statements, 8 
with fordoops, 77-78 
with while-loops, 79 
embedded Python, 528 

embedding in C/C++ programs, 541-543 
empty values, in Python/C API, 571 
encodeO function, 317-318 
FncodedFileO function, with non-ASCll strings, 152 
encoding 

e-mail messages, 317-318 
sound files, 453 
text files, 150 

encrypted modules, importing, 633-636 
encryption tools, 523-524 
endO method, 148 
end statement (Lepto), 435 
EndDrawingO method, 409 
endianness. See byte orders 
endswithQ method, 136 
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EntityResolver class (XMLReader), 337 
environ dictionary, 165 
environmental variables 
PythonPath, 94 
viewing, 165 
epochs, 219 

EpochSeconds, converting from/to, 221 
_eq_0 rnethod, 109 
equality operator (==), 30 
eraseO method (curses module), 418 
errno module/error messages, 190-191 
error handling 

assertions for, 83-86 
code debugging tools, 695-698, 699-700 
ConfigParser object, 189 
debugging CGI Scripts, 301 
debugging code using pdb, 497-501 
exceptions, 81-83 
locating bottlenecks, 505-509 
ZipFile objects, 215 
error messages (exceptions), 5, 80-82 
in C/C++ conversions, 537 
with ConfigParser object, 189 
databases, 243-244 
formatting, viewing, 606-607 
FTP object, 291 

handling in Python/C API, 576-579 

IMAP object, 288 

I/OErrors, 81-82 

NNTP object, 295 

os module errors, 190 

PicklingErrors, 200 

raising, 82 

SAX exceptions, 337 

shlex module, 437 

SMTP objects, 284 

Socket connections, 251-252, 254 

swallowing, 607-608 

with syntax errors, 97 

tracebacks, 605-608 

in win32all, 662 

ErrorHandler class (XMLReader), 337 
escapeO function, 147 
escape sequences, in strings 
formatting strings using, 36-37 
valid, listing of, 36 


evalQ function, 97 

Event class (concurrency control), 487-488 
event handlers/objects, 371-373 

for curse-based terminal displays, 423-424 
except clauses, 81 
exceptions. See error messages 
exclamation point (!) character, in regular 
expressions, 141 
execO functions, 180-181 
exec statement, 97 
executemanyO method, 242 
executing code. See code, executing 
exiting 

from functions, 88 
from processes, 183-185 
from Python, 4 
expandtabO method, 134 
expanduserO/expandvarsO functions, 163 
exponentiation, functions for, 582 
expressions, 29-33 
expungeO method, 287 
extendo method 
with arrays, 71 
with lists, 58 

Extensible Markup Language. See XML format 
extension classes, 550, 650-651 
extension modules, 527-528 
add/count functions, 529-530 
distributing, 650-651 
linking into Python, 531-532 
Numeric Python (NumPy), 589 
extension tools, Python-C interfaces 
CXX, SCXX (Simplified CXX), 549-550 
SWIG, 546-549 

extensions, for regular expressions, 143-144 

F 

f amnt statement (Lepto), 435 
FancyURLopener, 277-278 
FAQs, answers to, 686 

fcntl module for UNIX file descriptors, 680-681 

fdopenO function, 173 

feedback.py (CGI feedback form), 300-303 

fetchO method, for IMAP4, 286 

FieldStorage objects (CGI), 299-299 

file descriptors, 173-174 
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file formats, 467 

converting between, 208-210 
of databases, incompatibility, 232 
graphics file, 467-469 
mapping to MIME types, 316 
■pyc files, 631 
sound files, 456-463 

file Systems (UNIX), viewing Information about, 
677-678 

file viewer, creating (wxPython), 396-398 
filecmp module, 171-172 
Filelnput class, 176-177 
filelike objects, 126-127 
filenoO method, 123 

with sunaudiodev module, 455 
files (file objects) 
closing, 123 
comparing, 171-172 
compiling, 95 

configuration files, 188-190 
creating/opening, 11 
filelike objects, 127-129 
hiding, 318-319 
navigating, 123-124 

non-Python, including in distributions, 648-650 
opening, 122 
printing to file, 120 
in Python/C API, 571-572 
reading contents of, 125-126 
softspace attribute, 124 
transferring using FTP, 290-291 
viewing filenames, 163 
viewing current positions in, 123 
writing to, 124-125 
files, managing 

file descriptor functions, 173-174 
file input class, 176-177 
filecmp module functions, 171-172 
fnmatch module functions for, 164 
glob module functions, 165 
mmap objects, 175-176 

os/os.path modules functions, 163-164, 168-169 
tempfile module functions, 170-171 
viewing information about, 159-161 
finally clauses, raising exceptions using, 82-83 
findO methods, 136 

with mmap objects, 176 
findallQ method/function, 145, 147 


findfactorO method (audioop module), 462 
fingerprints, for messages, 521-523 
finishO method, 263 
fireO method, 103-104 
fixQ function, 153 
flags, for file descriptors, 173 
float class, 7 
floatO function, 7, 44 
_float_0 method, 115 
floating point (decimal) numbers, 7, 22, 24 
managing, 154 
flushO function, 176 
flushQ method, 125 
in curses module, 422 
with sunaudiodev module, 455 
fnmatch module, file/directory management 
functions, 164 

fnmatchcaseO function, 164 
Folder objects, 320-321 
fonts 

Internet text options, 305-306 
Tkinket options, 366 
wxPython options, 413-414 
FoodChoice.py, 352-354 
for-statements, 8, 74-77. See also looping 
with list comprehension, 51 
with lists or tuples, 55 
form fields, accessing, 299-300 
formO function, 182-183 
formatter module, 304-306, 327 
formatting 

Internet text, 305-306 
locale-specific formatting, 625-626 
time, syntax for, 222 
using User input, 357-359 
formatting strings 

escape sequences for, 36-37 
formatting characters (tables), 40 
methods for, 134-135 
preserving formatting in, 35 
struet module format characters, 204 
fpformat module (floating point numbers), 154 
fragments, audio (audioop module) 
converting between formats, 463 
managing, 463-464 
frames, 349. 393 

audio, managing, 457 
freeze utility, 656 
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from function, 250-251 

fromO methods 70 

fromlistO method, 70 

fromstringO method, 70 

fstatO function, with file descriptors, 173 

FTP object, creating/using, 289-291 

ftplib module, 289-291 

ftruncateO function, with file descriptors, 173 
fully qualified domain names, retrieving, 250 
funcO function, 164 

FunctionAttributes.py (checking version numbers), 
610-611 

functions. See also specific classes and objects 
arguments in, 89-90 
array objects, 68 
browsing attributes, 609-611 
defining in code, 5-6 
overloading, 108-111 
pickling, 198 

seeding arrays with, 598-599 
testing, 502-503 
writing in C/C++, 527-531 

G 

Game of Life example (LifeGame.py), 84-86 
gdbm module, 229, 232 
geometry managers, widget layout, 349-350 
getO methods 

accessing dictionary mappings, 61 
with ConfigParser object, 189-190 
with e-mail messages, 309 
opening Web browsers using, 308 
with Telnet object, 297 
with widget listboxes, 375 
with wxPython Controls, 400 
get_0 methods, 297 

with curse module cursors, 420-421 
getatimeO function, 158 
getattrO function, 102 
_getattr_0 method, 109-110 
getchO/ungetchO functions, 121, 669-670 
getchO method (curses module), 422 
getcwdO function, 165 
getfqdnO function, 248-250 
gethostbyaddrO, 248-259 
gethostbynameO function, 248 
gethostnameO function, 248-250 
_getitem_0 method, 113 


getkeyO method, 422 
getlineO function, 174-175 
getmtimeO function, 158 
getnameO method, 460 
getpass module/getpassO function, 516-517 
getpeernameO method, 253 
getsampleO method 9audioop module), 462 
GetSelectionQ method, 400 
getservbyname 0 function, 248-250 
getsignalQ function, 192 
getsizeO method, 460 
getSocketO function, 117 
getsocknameO method, 253 
getsockoptO methods, 254-255 
getstrO method, 422 
getuserO function, 516 
getweakrefsO function, 116 
getweakrefscountO function, 116 
getwelcomeO method, 289 
globO function, 165 
glob module, 165 
global interpreter lock, 496 
global namespaces, 95 
GlobalDict, 97 
globals built-in function, 96 
gmtimeO function, 221 
Gopher protocol, 291-292 
gopherlib module, 291-292 
graphical user interfaces. See GUls 
graphics/image files. See also drawing 
animating (CanvasBounce.py), 368-369 
converting to bitmaps, 475-476 
creating GIF images (clockgif.py), 477-479 
file formats for, 467, 474-475 
with GUls, 366-367 

handling using Python Imaging Library, 472-475 
handling in wxPython, 413 
identifying file types, 468-469 
modifying pixel data in, 476-477 
resizing, 476 

grayul.py (HTML file viewer), 396-398 
greater than operator (>), 30 
greater than, equal to operator (>=), 30 
grid method options, 351-352 
grid sizers (wxPython), 405-406 
groupO, 148 

accessing newsgroups using, 293 
groups, checking in UNIX systems, 671-672 
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groupsO method, 148 

grp module, in UNDC systems, 672 

_gt_0 method, 109-110 

guessing game program (NumberGuess.py), 73-74 
guess_typeO function, 316 
GUI-based applications, file-like objecs in, 128-129 
GUIs (graphical user interfaces). See also Tkinter; 
wxPython 

appearance/behavior options, 354 
color options, 354, 365 

color scheme customizer (ColorChooser.py), 
377-381 

dialog/message boxes in, 361-365 
event handiers, 371-372 
font options, 366 
graphics/images in, 366-369 
incorporating user input, 356-359 
Lepto-based interfaces, 433-450 
menu widgets, 360-361 

printing exception tracebacks (GUlErrors.py), 
607 

size options, 354 
text widgets, 359-360 
GUlErrors.py, 606-607 
gunzip module, 213 
gzip module, 213-214 

H 

handleO method, 263 
HandleForm.py (CGl script), 267 
handiers, for asynchronous signals, customizing, 
191 

hasattrO function, introspection using, 608-609 

has_colors function (curses module), 427 

hashO function, 62 

hashO method, 109-110 

_hash_0 method, 110 

hashahlity, 62 

has_keyO method, 61 

header values, e-mail message, retrieving, 309-310 
HelloWorld.py (CGl script), 298-302 
help Systems 

in cmd module, 441-442 
newsgroups, 687 

technical assistance Web sites, 686 
tutoriais, 17 


_hex_0 method, 115 

hexadecimal values, converting strings to, 75-76 
hives, in Windows registry, 664 
HLS (huedightness-saturation) color system, 470 
converting to RGB color system 
(ChooseColor.py), 471-472 
HomePage.py, 666 
host names, 248 

HSV (hue-saturation-value) color system, 470 
HTML files 

converting Python code to, 612-613 
filtering text in, 329-330 
handling in wxPython, 414 
parsing, 327-329 

viewing in wxPython (grayui.py), 396-398 
HTML markup language, 325 
html module (wxPython), 414 
htmllib module, 327 
HTMLParser class 

filtering HTML text (BoldOnly.py), 329-330 
handling bogus/unknown elements, 329 
parsing methods, 327-329 
Weh Robot (Robot.py), 331-334 
HTTPO method, 279 
HTTP request file (httpreq.py), 149-150 
HTTP requests, sending/receiving, 279-280 
httplib module, 278 

Hypertext Markup Language. See HTML files 
hypotenuse, calculating, 582 

I 

_iadd_0 method, 112 
idO function, 64 
identifiers 

reserved words, 20 
valid versus invalid identifiers, 19-20 
identity references, comparing, 63-64 
_idiv_0 method, 112 

IDLE. See Interactive DeveLopment Environment 
if blocks, setting aside code using, 14 
if-statements, 8 

else-blocks (elif-blocks) with, 73 
with list comprehension, 51 
ignoreO function, 172 
ihaveO method, 295 
_ilshift_0 method, 114 
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ImageDraw module (Python Imaging Library), 
477-479 

images, adding to GUIs, 366-367. See also 
graphics/images 

ImageTk module (Tkinter module), using with 
Python Image Library, 475-476 
imaginary numbers, 22 
1MAP4 objects, 285-288 
imaplib module, 285-288 

imghdr module, identifying image types, 468-469 
immutable data types, 10 
strings as, 38, 133 
_imod_0 method, 112 
imp module, 629-631 

importing Python modules, 631-633 
_import_0 function 
overrriding, 94 
using, 629-631 

import statements, 14, 93, 629 
Importer class, dummy custom Importer 
(dumbimp.py), 637 

importing 

encrypted modules, 633-636 
Python modules, 14, 92-93, 629-631 
importpye.py (importing modules), 634-636 
imputil module 

with encrypted modules, 633-636 
Importer class, 636-637 
_imul_0 method, 112 
in operator (string comparisons), 43 
inchO method (curses), 417 
include file statement (Lepto), 435 
indenting 

code, conventions for, 6 
function definitions, 611 
indexO method 
with arrays, 71 
with lists, 58 
with strings, 136 
index names, stat module, 160 
indexes 

accessing sublists using, 9-10 
with lists, 9 

support for by sequence types, 9-10 
indexing, array elements, 590-592 
inequality operator (!=), 30 


inheritance 

child classes, 102-103 
multiple inheritance, 14, 103-104 
initialization methods, 100, 109 
initscrO function, 416 
inodes, 158 
inputO function, 120 

with Filelnput objects, 176 
input. See also GUls; user input 
audio files, reading, 457 
functions for, reading, 120-121 
redirected, detecting, 128 
wxPython module options, 411-412 
input/output sizes, in databases, 242 
inserto method, 59 
with arrays, 71 
with widget listboxes, 375 
installers, 653-654 

for standalone applications, 657 
instance data type, 107 
instance variables, 101 
instances of classes, 14 
InstanceType data type, 68 
instrO method (curses module) 
intO function, 44 
_int_0 method, 115 
integers, 21 
interactO method, 296 

Interactive DeveLopment Environment (IDLE), 4, 
689-690 

debugging using pdb, 695-698 
editing code using, 692-695 
interfaces. See also GUIs 

to GNU Multiple Precision Arithmetic Library, 
587-588 

with NIS “Yellow Pages,” 682-683 
Python - C/C++, 546-550 
internationalization, 619-624 
Internet Explorer, as horne page, 666 
Internet Communications 
formatting text, 304-307 
managing URLs, 303-304 
protocols for, 275, 303-307 
interweaving threads, 495 
introspection, 608-611 
invalid identifiers, examples of, 19 
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inversion operator 23 

_invert_0 method, 112 

I/O (input/output) calls, optimizing, 510-511 

I/O sizes, in databases, 242 

lOError exceptions, 81-82 

IP addresses, 247 

is operator, in reference comparisons, 64 

isabsO function, 157-158 

isalnumO method, 135 

isalphaO method, 135 

isattyO function, with file descriptors, 173 

isattyO method, 123, 128 

isdigitO method, 135 

isdirO function, 158 

isfileO function, 158 

isinstanceO function, 68, 107 

isIeapO function, 225-226 

isIinkO function, 158 

isIowerO method, 135 

isspaceO method, 135 

_issub_0 method, 112 

issubclassO function, 68, 107-108 

istitleO method, 135 

isupperO method, 135 

J 

joinO function, building paths, 161-162 
joinO method, 138 
joinfieldsO function, 140 
joining sequences, 52-53 

K 

Key Bindings (readline module), 676 
key names, in string formatting, 41 
key-value pairs, 10-11 
keyboard event bindings, 371-372 
keyboard input 

accessing, 120-121 

with curse-based terminal displays, 421-422 

in wxPython, 412 

keyboard shortcuts, in xwPython, 411-412 
keypadO method (curses module), 421 
keys, 10-11 

in Windows registry, 664 
keyword arguments, unpacking in C/C++ 
conversions, 537-538 
Kill and Yank Key Bindings, 677 


killing (readline module), 677 
KillKey.py, 667-668 

L 

I amnt statement (Lepto), 435 
lambda (anonymous) functions, 90-91 
languages, Lepto, 435. See also C/C++ 
last item on list, accessing, 9 
layout (wxPython) 
algorithms, 407 
constraints on, 406-407 
options for, 401-406 
_le_0 method, 109 
leap years, 226 
left bit-shift operator («), 24 
lenO function, 35 
with arrays, 71 
with dictionaries, 62 
lenOmethod, 110, 113 
Lepto-based interfaces 

graphical interface for, 445-450 
Interactive console for (leptocon.py), 442-445 
Lepto language basies, 435 
Lepto Lexical Analyzer, 436-440 
parser for (leptoparser.py), 437-440 
simple example of (leptogui.py), 445-450 
LeptoCon class (cmd module), 442-445 
leptogui.py, 445-50 
Leptoparser.py, 437-440 
less than operator (<), 30 
less than or equal to operator (<=), 30 
LessBuggy.py, 697-698 
lexical analyses, shlex module for, 436-440 
libraries, C shared, using in UNIX systems, 675. See 
also specific classes and objects 
limits, UNIX system resources, 679-680 
linear encoding (sound files), 453 
linecache module, 174-175 
linenoO function, 177 
linkO function, 168-169 
links 

for extension modules, 531-532 
managing, functions for, 158 
symbolic/hard system links, 168 
Linux RPM SPEC options, 654 
list command (pdb module), 498 
list comprehensions, 51 
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listQ function, 50 
listO method, 10 
with e-mail, 281 
listbox widget, 375-376 
listdirQ function, 163-164 
listenO method, 251-252 
listenThreadO function, 203 
lists (list objects) 

accessing last item on, 9 
C functions for, 563-564 
converting arrays to, 69-70, 592-593 
creating, functions for, 50-52 
in dbhash module, 233 
deleting items or slices from, 58 
for. . .in-statements with, 55 
index numbers for, 9 
methods for, 58-60 
performance issues, 68 
pickling, 198 

Processing functions, 55-57 
replacing values in, 57 
sorted, managing items in, 60 
switching to tuples from, 10 
IjustO method, 134 
loadsO function, 197 
local namespaces, 95 
LocalDict, 97 
locale module 

formatting options, 625-626 
locale categories, 624-625 
locale properties, 626-627 
locale-specific formatting. See localization 
locaihost addresses, 247 
localization, 619, 624-625 
time formats, 221, 223-224 
locais built-in function, 96 
Locator class (XMLReader), 337 
Lock class (concurrency control), 486 
locking, global interpreter lock, 496 
locking threads, 485-488 

preventing deadlock, 488-489 
logarithms, calculating, 582 
loginO/logoutO methods (IMAP), 285-288 
longO function, 44 
_long_0 method, 115 
long integers, 21-22, 24 


looping statements, 7 

break-statements with, 77-78 
breaking out from, 8 
changing reference sequences in, 78-79 
continue-statements with, 77-78 
else-clauses with, 77-78 
optimizing, 510 
while-statements with, 8, 79 
loose typing, 4-5 
lossless compression, 467 
lossy compression, 467 
lowerQ method, 134 

IseekQ function, with file descriptors, 173 

_lshift_0 method, 114 

IstatQ function, 161 

IstrigO method, 134 

_lt_0 method, 109-110 

M 

MagicSquare.py (using ufuncs), 596 
MainLoopO method, 393 
mailbox module, 320-321 
mailboxes 

administering, 287-288 
managing/searching, 285-286 
MH, managing, 320-321 
UNIX, parsers for, 320 
mailcap files, parsing, 317 
mailcap module, 317 
maillists about Python, joining, 687 
makedirsO function, 169-170 
maketransO function, 139 
managed Windows (wxPython), 394-395 
mappingO function, 117 
mapping objects (Python/C API), 569-570 
mappings, dictionary 
accessing, 61 
adding to/replacing, 61 
updating, 62 

mappings, of MIME type file extensions, 316 
marked parameters (SQL statements), 242 
markup languages, 325. See also HTML files; 

XML format 
marshal module, 200 
Mask class (curses module), 419-420 
masked arrays, 589 

mask.py (terminal display screen mask), 419-420 
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matchO function, with regular expressions, 147 

matchO method, 145 

match objects, methods for, 148 

matching 

nongreedy, 143 
regular expressions, 145 
math module 

exponent calculations, 582 
logarithm calculations, 582 
rounding, 581-582 
trigonometric functions, 582-583 
matrix operations, with arrays, 603-604 
maxO function, 31, 55 

with string comparisons, 43 
maze game (maze.py, curses module), 429-432 
MD5 message digits algorithm, 522 
membership testing, sequence data types, 53 
memory, managing, 512, 579 
memory-mapped files, 175-176 
menu widgets, 360-361 
menus, adding to xwPython, 411-412 
message fingerprints, 521-523 
Message objects, 309, 320-321 
messages, adding to mailboxes, 288. See also 
e-mail; networking 

metadata 

auditing table example (mirrormaker.py), 
238-240 

sequence pieces (table), 238 
metatext, 326 

methods. See also functions and specific classes 
and objects 

for array objects, summary of, 71 
base methods, 109 
initialization methods, 100 
overloading (table), 111-112 
self referencing, 15 
MH mailboxes (MH objects), 320-321 
MIME messages 

encoding/decoding, 312 
mailcap files, parsing, 317 
mapping to file extensions, 316 
multipart messages, 313-314 
parsing, 311-313 

testing, example file (MimeTest.py), 314-315 
mimetools module, 312 
mime.types file, 316 


mimetypes module, 316-317 
MimeWriter module, 313 
mimifyO function, 312 
mimify module, 312 
minO function, 31, 43, 55 
minimum field width number (string 
formatting), 41 
minus operator (-), 23 
in string formatting, 41 
mirroredO function, 153 
mirrormaker.py (audit tool), 238-240 
mix-ins, multiple inheritance with, 104 
mkdirQ function, 169 
mktempO function, 170 
mktimeO function, 221 
mmap module/objects, 175-176 
_mod_0 method, 112 
mode method, values, 122-123 
mode values, openQ function, 122 
modes, for paths, setting, 157 
modifying attributes, 101-102 
module objects (Python/C API), 572-574 
module type values, 632 
modules. See also specific modules 
compiling/storing, 95 
copying, 66 

customizing using mix-ins, 104 
distributing/installing, 644-647 
encrypted, importing, 633-636 
extension, distributing, 650-651 
grouping into packages, 96-97 
importing, 14, 92-93, 629-631 
layout, 91-92 
locating, 94 
reading lines from, 175 
reimporting, 93-94 

retrieving from remote locations, 636-641 
modulo operator (%), 7, 23 
in string formatting, 40-41 
Monkeys.py, 96 

Monte Carlo sampling (Plotter.py), 586-587 
monthO function, 224-225 
monthcalendarO function, 224 
monthrangeO function, 225 
Morsel object, 322-323 
morsels, storing cookies as, 322-323 
mouse buttons, binding, 372 
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mouse cursors, customizing (wxPython), 413 
mouse input, mouse events 

detecting in curses module, 423-425 
terminal displays, 423-424 
in wxPython, 412 

mousemaskO function (curses module), 423-424 
using (Deathray.py), 424—425 
MovingAverage.py (matrix arithmetic), 603-604 
mpz module, 586-587 
msvcrt module, 121 

Windows-specific Services, 669-670 
_mul_0 method, 112 
multi-process socket server classes, 104 
multi-threaded socket server classes, 104 
multicast Communications, example code for, 
256-261 

MultiFile class, 313 
multiple assignment statements, 27 
multiple inheritance, 14, 103-104 
multiplication operator (*), 23 
repeating strings using, 37-38 
multiplying array matrices, 602-604 
Multipurpose Internet Mail Extensions. See MIME 
messages 

multitest.py (multicasting), 257-260 
multithreading. See threading 
MutableString class, 105 
mxODBC module, database searching using, 
235-237 

N 

nameO method/function, 123, 187 
namelistO method, 215 
names, of variables, 19-20 
namespaces, 95, 97 

lambda (anonymous) functions, 91 
objects, 63 
in XML, 327 

native language support (NLS), 619-620 
adding to applications, 620-624 
ncurses API. See curses module 
ndiff utility, 172 
_ne_0, 109 
_neg_0 method, 112 
nearestO method, 375 
netrc files, handling, methods for, 291 


NetWork News Transport Protocol (NNTP) object, 
292-295 

network orders, 256 
networking, 247-248, 267 
byte ordering settings, 256 
CGI script handlers, 267-269 
connection objects, 251-252 
HTTP servers, 264-267 
multicast Communications, 256-261 
non-threaded Communications, 269-273 
sending/receiving data, 251-252 
socket module functions, 248-250 
socket servers, 261-263 
networks, moving objects between, 200-203 
new module, 614-615 
newnewsO method, 293-294 
NewObject statement, 15 
newpadO function (curses module), 426 
new_panelO function (curses module), 426 
newsgroups 
accessing, 292 
browsing, 293 

managing, methods for, 292-295 
NewsSlurp.py, 294-295 
nextfileO function, 177 

NIS (Sun System) “Yellow Pages,” UNIX interface 
with, 682-683 

NLS. See native language support 
NNTP object, 292-295 
nntplib module, 292-295 
non-ASCII strings, 150 

non-managed Windows (wxPython), 395-396 

None value, 11 

nonprintable characters, 34 

_nonzero_0 function, 109 

norawQ function (curses module), 421 

normcaseO function, 162 

normpathO function, 162 

not in operator, 43 

not operator, 32 

NullWriter, 306-307 

NumberGuess.py (exception handling script), 
73-74 

NumberGuess2.py, 80-81 
numbers, pickling, 198 


"713 Index 4- N-0 


numeric data types 
combining, 24 

comparison functions, 23-24 
comparison operators, 29-31 
converting to/from string data type, 43-47 
floating point numbers, 22 
functions for, 24-26 
imaginary numbers, 22 
integers, 21 
long integers, 21-22 
numericO function, 153 
numeric operators, 111-112 
Numeric Python (NumPy), 589 
array elements, 600 

array-handling functions, 590-593, 597-601 
array matrices, 602-604 
universal functions, 593-596 

0 

Object Graphics Library (OGL), accessing in 
wxPython, 413 

Object-Oriented programming (OOP), 15 
classes, creating, 100-101 
Python support for, 99 
object references 
passing, 88 
variables as, 88 
object state, 196 

objects, 14. See also specific classes and objects 
class definitions, 100 
class variables, 100 
classes, 100-101 
copying, 65-67 

creating from C/C++ code, 539-540 
creating new objects, 15 
identify references, 63-65 
instance objects, 101 
instance variables, 101 
keys, 10-11 

low-level, creating, 614-615 
managing attributes in, 101-102 
moving across networks, 200-203 
pickling, 198-200 
proxy objects, 117-118 
sys module, 126-127 
values, 10-11 

weak references with, 115-116 


objects, in Python/C API 
buffer interface, 566-567 
built-in types, 571 
dictionary functions, 570-571 
file objects, 571-572 
generic objects, 556-558 
list functions, 564-565 
mapping objects, 569-570 
module objects, 572-574 
number objects, 558-561 
tuple functions, 565-566 
obufcountO method (sunaudiodev module), 455 
octO function, 46 
_oct_0 method, 115 

OGL (Object Graphics Library), accessing in 
wxPython, 413 
onButtonO method, 393-394 
onecomdO method (cmd module), 441 
OOP. See object oriented programming 
openO function, 11 
with arrays, 203-204 
with audio files, 457 
creating file descriptors, 173 
gzip module, 215 
with non-ASCll strings, 152 
opening files, 121-122 
with sunaudiodev module, 455 
with URLs, 278 

openlogO/closelogO functions, 674-675 
openptyO function, 173 

operator module, overloadable functions, 108-111 
operators 

arithmetic operators, 4 
augmented assignments, 28 
Boolean operators, 31-32 
comparison operators, 29-30 
listing of, 20 
modulo operator (%), 7 
with numeric data types, 23-24 
overloading, 111-114 
precedence rules for, 4, 33-34 
reference comparisons, 64 
string comparisons, 43 
optimizing. See performance, optimizing 
_or_0 method, 114 
OR operator (I), 23, 32 
ordO function, 45 
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order modifiers, in struet module, 206 
orientation, of Controls, in wxPython, 403-404 
os module, 156 

error exceptions, 190-191 
exeeuting Shell commands, 179-181 
exiting from processes, 183-184 
file descriptor functions, 173-174 
file management functions, 163-164, 168-169 
file-opening functions, 122 
path management functions, 156-160 
process Information functions, 185-186 
running child processes, 181-183 
viewing environmental variables, 165 
viewing system Information, 187 
OSError class, 190-191 
os.path module 

building/vreaking up pths, 161-162 
comparison with os.module, 156 
file/directory management, 164 
path management functions, 157-162 
output. See also 1/0; printing 
audio files, 457-458 
print statement, 119-120 
WordCount.py, 13 
overloading 

bitwise operators, 114-115 
dictionary operators, 112-113 
functions, 108-111 
numeric operators, 111-112 
sequence operators, 112-113 
type conversion operators, 115 

P 

packO methods, 208-209 
widget layout, 350-351 
packages 

distributing/installing, 647-648 
grouping modules into, 96-97 
Packer objects, creating, 208-209 
packing data types, 208-209 
pads (curses module), 425-426 
palettes, 467 

parameters, in functions, 88-89 
parent classes, 14 

child classes from, 102-103 
multiple inheritance, 103-104 


parentheses (( )) 
in calculations, 4 
in complex expressions, 32-33 
in regular expressions, 141 
parsing 

HTML files, 327-329 
Lepto programs, 436-440 
Python data, 531-537 
Python code, parse trees for, 613-614 
XML documents, 327 
passwords 

managing, 516-517 
in UNIX Systems, 671-672 
path type test function, 160 
paths, 155 

accessing, 156-157 
locating, for modules, 94 
managing, 156-161, 168-169 
paths. managing 

os module functions for, 156-157, 168-169 
os.path module functions for, 157-159, 161-163 
stat module functions for, 160-161 
statcache module functions for, 161 
pauseO function, 192 
pdb module (debugger), 497-498 
performance, optimizing 
1/0 calls, 510-511 
locating bottlenecks, 505-509 
looping, 510 
managing memory, 512 
organizing if-statements, 74 
performance statisties, 507-508 
“simultaneous” code, 495-496 
sorting, 509 
sound files, 454 
string-handling, 511 
thread-handling, 511 
periods (.) 

in regular expressions, 140 
in string formatting, 42 
permissions, 157 
phone list (database), 231-232 
Photolmage class (Tkinter module), 366-367 
with Python Imaging Library, 475-476 
pi constant, in math module, 581 
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pickling, 197-198 
classes, 199-200 
swap module example, 200-203 
pipeO function, 173 
pipes, 173 

pixel data, raodifying, 476-477 
playing/recording sound files 
SunOS. 455-456 
Windows Systems, 454-455 
PlaySoundQ function (winsound), 454-455 
plotter program, creating using shiex module, 433 
plus operator (+), 23 

concatenating strings using, 37 
overloading, 108 
in regular expressions, 141 
in string formatting, 41 
pocket calculator, Python as, 4 
point class (Point.py), 15-16 
PoliteGet.py, 307 
pollQ function, 270 
polling objects, 270 
_pos_0 method, 112 
_pow_0 method, 112 
popO method, 59-60 
with arrays, 71 

pop arg statement (Lepto), 435 
POP3 accounts, accessing, 281-283 
popenO functions, 181-182 
popitemO method, 63 
poplib module, 281-283 
popmail.py, 281-282 
Popup.py (menu widget), 361 
porting threaded code, 494 
_pos_0 method, 112 
postO method, 295 
POST requests, 280 
powQ function, 25-26 
_pow_0 method. 112 
power operator (**), 23 
precedence rules, operators, 4, 33-34 
primary orientations, in wxPython, 403-404 
PrimePinder.py (looping statements), 7-8 
print statement/command 
in pdb module, 498 
printing to file, 120 
printing, 119-120 

calendars, functions for, 224-225 


tracebacks, 605-607 
with wxPython, 414 
prmonthQ function, 225 
processes 

handling, functions for, 181-185 
viewing Information about, 185-186 
profile module/Profile class, 506-507 
programs, running, 6, 179-181. See also code, 
executing 

progress bar, creating, 387-388 
properties, of locales. 626-627 
protocols, Communications, 248, 275 
proxy objects, 117-118 
proxy servers, 276 

pseudoterminals (UNIX systems), 681-682 

pstats module/Stats class, 507-508 

public access, versus private, 15-16 

pushO method, 313 

push arg statement (Lepto), 435 

pwdO method (FTP server). 289-290 

pwd module, using with UNIX systems, 671-672 

py2exe utility, 655-657 

PyArg_ParseTuple object types, 533 

Py_BuildValue object types, 539-540 

■pyc files, 631 

pyclbr module, browsing classes, 609 
pydoc module, 501 
PylnterpreterState objects, 576 
■pyo files, 95 
PyObject pointer, 553 
PyShellWindow module (wxPython), 398 
Python/C API 

built-in types, 571 
C list functions, 564-565 
CObjects, 574 

dictionary functions, 570-571 
empty values, 571 

error messages/exceptions, 576-579 
extension tools, 546-550 
file objects, 571-572 
generic objects, 556-558 
managing memory, 579 
mapping functions, 569-570 
module objects, 572-574 
number objects, 558-561 
object layers, 556 
reference conventions, 554-555 
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reference ownership, 553-555 
sequence objects, 561 
sub-interpreters, 576 
threads, 574-576 
tuple functions, 564-565 
Unicode strings, 567-569 
Python Database API, 234-244 
Python distribution downloads, 685-686 
Python Enhancement Proposais (PEPs), 687 
Python Extensions for Window. See win32all 
Python Imaging Library (PIE) 
features, 472-475 
image formatting, 480 

Python interpreter, starting and exiting from, 3-4 

Python MegaWidgets (Pmw), 389 

Python mode for Emacs, 692-669 

Python Threading SIG, 496 

PythonPath variable, 94 

PythonWin, 698-699 

Python Works, 690 

PyZipFile class, 216 

Q 

querying relational databases (soundex.py), 
235-237 

question mark (?), in expressions, 141 
Queue module/Queue class, interweaving threads 
using, 495 

Quiet.py (audio editor), 594-595 
quoteO function, 276 
quoted-printable encoding, 319 
quotes, in string literals, 35 

R 

r amnt statement (Lepto), 435 
_radd_0 method, 112 
raising exceptions, 82 
randora nurabers, generating, 583-587 
deck shuffling example, 585 
distributions for, 585 

Monte Carlo sampler (Plotter.py), 586-587 
rangeO function, 8, 50-51 

with looping statements, 76-77 
ranges, in lists, managing, 50-51 
rawQ function, 421 
raw mode (curses module), 421 
raw_inputO function, 120-121 
_rdiv_0 method, 112 


_rduvmod_0 method, 112 
re module. See regular expressions 
readO methods, 125 
with audio files, 461 
with ConfigParser object, 188 
with file descriptors, 173 
with mmap objects, 176 
with sunaudiodev module, 455 
with Telnet objects, 297 
with ZipFile objects, 215 
read_byteO method, 176 
readframesO method, with audio files, 457. 

See also chunked data 
reading file contents 

chunked audio files, 460461 
nonchunked audio files, 457 
text files, 125-126 
readlineO method, 125-126 
with mmap objects, 176 
readline module, 121 

in UNIX Systems, 675-678 
readlinesO methods, 126 
Real Media File Format (RMFF) files, reading, 
460-461 

recursive grep utility (rgrep.py), 166-167 
recvQ/recvfromO methods. See networking 
redirected input, detecting, 128 
refQ function, weak referencing, 116 
references, object 
comparing, 63-64 
counting, 64-65 

and memory management, 512-513 
tracking ownership of, 553-555 
weak references, 115-116 
refreshO method (curses module) 
with pads, 426 
with Windows, 418 
registerO function, 184 
with Web browsers, 308 
regular expressions 

character groups, 142-143 
creating, 144-145 
extensions, 143-144 
nongreedy matching, 143 
syntax, 140-141 
using, 145-147 
reimporting modules, 93-94 
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relational databases 
accessing, 234-235 
auditing tables, 238-240 
column types, 240-241 
database libraries, 242 
error hierarchies/exceptions, 243-244 
input/output sizes, 241 
metadata, 237-240 
SQL statement parsers, 242 
relative paths, 155 
reloadO function, 93-94 
Remainder.py, 599-600 
remainders, calculating, 7 
remote importer, 637-640 
remote server access, 296-298 
removeO function, 168-169 
removeO method, 59 
with arrays, 71 
removedirsQ function, 170 
removing. See also deleting; exiting 
attributes, 101-102 
files, 169 

renameO/renamesO function, 168-169 

renaming paths, 168-169 

repeat statement (Lepto), 450 

repeat count sub statement (Lepto), 435 

repeating sequences, 52-53 

repeating strings, 37 

replaceO method, 137 

replacing, substrings, 137 

reporto method, 172 

reprO function, 46-47 

_repr_0 method, 109 

request handiers, 262-264 

Request objects, managing HTTP files/URLs using, 
278-279 

reserved words, 19 
reset arg statement (Lepto), 435 
reshaping, array objects, 600-601 
resizing objects, 476, 600-601 
resource module 

resource limit settings, 679-680 
UNIX System usage Information, 678-679 
resource usage (UNIX systems), 677-678 
retrO method, 281 
retrieveO method, 278 
return-statements, 88 


reverseO method, 60 
with arrays, 71 
ReverseSound.py, 459-460 
rexec module/RExec object, 517-520 
rfc822 module (e-mail handling) 
e-mail address lists, 310-311 
handling MIME messages, 311-312 
parsing e-mail headers, 309-310 
RFCs (requests for comments), 275 
rfindO method, 136 

RGB (red-green-blue) color system, 469-470 
converting HLS system to (choosecolor.py), 
471-472 
rgrep.py, 166-167 
rich comparison methods, 110 
right bit-shift operator (>), 24 
rindexO method, 136 
rjustO method, 134 
rlcompleter module, 675-678 
RLock class, concurrency control, 486-487 
_rlshift_0 method, 114 

RMFF (Real Media Format) sound files, handling, 
460 

_rmod_0 method, 112 
rmtreeO function, 169 
_rmul_0 method, 112 
robot programs, 307-308 

Weh robot example (Robot.py), 331-334 
RobotFileParser object, 307-308 
robotparser module, 3-7-308 
rotor module/rotor objects, 523-524 
roundO function, 26, 45 
rounding, 7, 581-582 
_rpow_0 method, 112 
_rshift_0/_rrshift_0 methods, 114 
rstripO method, 134 
_rsub_0 method, 112 
ruimp.py (remote Importer), 638-640 
runO method, 483-484 

running programs, 6, 500. See also code, executing 

s 

sample rates/widths (sound files), 453 

sandboxes, 517-520 

saving ohjects into databases, 203-203 

SAX (Simple API for XML), using, 327, 334-337 

scale amnt statement (Lepto), 435 
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scale widget, 376 
sciO function, 153 
scope rules, 95-96 

screen-scraping (curses module), 417-418 
scripting languages (Lepto), 433-450 
scrollbars 

with widgets, 376-377 
with xwPython Windows, 396 
searchO method, 145, 147, 286 
searching 

databases, 235-237 
dates, calendars, 225-226 
files, grep utility for, 166-167 
match objects, 147-148 
newsgroup articles, 293-294 
regular expressions, 145 
robot programs for, 331-334 
sound fragments, 462 
strings, methods for, 135-136 
Web searches, 280 

secondary orientations, in wxPython, 404 
Secure Hash Algorithm (SHA), 522-523 
security issues, CGl Scripts, 302 
security tools 

encryption, 523-524 
message fingerprints, 521-523 
passwords, 516-517 
restricted environments, 516-521 
seekO method 

changing current position within file, 123-124 
with mmap objects, 176 
selecto method (1MAP4), 286 
select module, non-threaded Communications, 
270-273 

selection_setO method, 376 

seif-examining code (introspection), 608-611 

self references, 15, 100 

Semaphore class (concurrency control), 487 

sendO method/function, 203, 252 

sending/receiving e-mail, 291-285 

sendmailQ method, 283 

sendtoO method, 252 

sequence data types, 9-10, 49 

accessing portions of using slices, 54 
accessing portions of using subscription, 53 
comparing, 53 
joining/repeating, 52-53 
membership testing, 53 


Processing functions, 55-57 
unpacking, 54 

sequence operators, overloading, methods for, 
112-113 
set_0 methods 

with Telnet object, 297 
with wxPython Controls. 402, 411 
SetAcceleratorTableO method, 411 
setattrQ function, 102 
_setattr_0 method, 109-110 
setblockingO method, 253-254 
SetCursorQ method, 413 
setdefaultO method, 61 
setfirstweekdayO function, 225-226 
setinputsizesO/setoutputsizesO methods, 241 
_setitem_0/_setslice_0 methods, 113 
setparamsQ method, using with audio files, 457 
SetPositionO method, 402 
SetScrolIBarsO method, 395 
SetSizeO method, 402 
setsockoptO method, 254-255 
SetStatusBarO method, 395 
SetToolbarO method, 395 
Setup functions, embedding, 542-543 
setupO method, 263 

Setup.py 

customizing, 650 
for package distributions, 648 
for simple distributions, 644 
SGML (Standard General Markup Language), 325 
shallow copies, 65-67 
shared libraries (UNIX systeras), 675 
Shell commands, executing, 179-181 
shelve module storage functions, 203-204 
shlex module/shlex class, 433, 436-437, 522-523 
Lepto parser program (leptoparser.py), 437^40 
shutdownQ method, 251 
shutil module 

file management functions, 169 
path management functions, 168 
signal handlers, customizing, 191-193 
signal module, asynchronous signal handling, 
191-193 

sig.py (signal handier), 192-193 
Simple API for XML. See SAX 
simple.c extension program, 529-530 
SimpleCookie class, 322 
SimpleHTTPRequestHandler class, 266 
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SimpleHTTPServer module, 264 
Simplified Wrapper and Interface Generator. See 
SWIG 

single underscore (_) character, 20 
sizeO function, 176 

sizeO raethod, with widget listboxes, 375 
size modifiers, struet module, 206 
size options (Tkinter widgets), 354 
sizers (wxPython), 403 
box sizers, 403-405 
grid sizers, 405-406 
sleepO function, 221 
slice operators (slicing) 

with array elements, 590-592 
copying objects using, 65-66 
with sequence data types, 54 
with strings, 39-40 
SMTP accounts, 283-285 
smtplib module, 283-285 
sndhdr module, 456 
Socket module/socket objects, 248-250 
asjmchronous dispateher class, 271-273 
binding/connecting, 251-252 
calling, 117 

Communications options, 254-255 
copying, 66 
creating, 250-251 
message handling, 251-253 
managing, 250-251 
open sockets, viewing, 247 
ports/lP addresses for, 253-254 
Socket functionO, 250-252 
Socket servers, 261-263 
SocketServer module 

modifying using mix-ins, 104 
TCP/UDP subclasses, 261-263 
softspace attribute, in file objects, 124 
sorting 

array objects, 601 
lists, managing items in, 60 
optimizing, 509-510 
sound files 

AlFF files, 456-458 
AU files, 458 

components/features of, 453-454 
converting formats, 462-463 


managing sound in, 463-464 
playing/recording, 454-456 
reading/writing, 456-461 
reversing sound on, 459-460 
storing, 456 
WAV files, 458 

soundex.py (database query), 235-237 
source code editors, 691-695 
Emaes editing tools, 691-692 
Interactive DeveLopment Environment (IDLE), 
692-695 

making Python-aware, 692 
source distributions 

controlling files in, 648-649 
creating, 651-653 
spanQ method, 148 
spawnO functions, 182-183 
special characters, unpacking in C/C++ 
conversions, 536 
spiral.py, 421 
splitfieldsQ function, 140 
splitlinesO method, 137-138 
splitting 

paths, 162 

regular expressions, 146-147 
substrings, 137-138, 140 
Windows in wxPython, 395-396 
SplitVerticallyO method, 395 
SQL statement parsers, 242 
square roots, calculating, 582 
stack traces, printing, 606-607 
stacking Windows (curses module), 426 
StackPrint.py, 606-607 
standalone applications, building tools 
archives and standalones, 657 
freeze, 656 
py2exe, 655-656 

Standard 1/0, accessing, 126-127 
startO method, 148 

with Thread object, 483-484 
startbodyOmethod, 313 
startfileO function, 180 
starting Python interpreter, 3 
startwithQ method, 136 
stato function, 159-160 
stat module, index names (table), 160 
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statcache module, 161 

statements. See also specific types of statements 
class definitions, 15 
function definitions, 5-6 
grouping by indentation level, 6-7 
Lepto-supported, 435 
types of, 6-8 
statements, assignment 
augmented, 27-28 
multiple, 27 
simple, 26-27 

static extension module linking, 531 
statistics, code performance, 507-508 
status bars, adding to Windows (wxPyton), 395-396 
statusO method (IMAP), 288 

statvfs module (UNIX system Information), 678-679 
stderr/stdin/stdout objects, 126-128 
storeO method (1MAP4), 286-287 
storing 

modules, 95 
objects, 203-204 
sound files, 456 
ufuncs output, 594 
strQ function, 46 
_strO method, 109 
StreamRequestHandler, 264 
strerrorQ function, 190 
strftimeO function, 222 
string class, customizing, 105 
string data type, 34 

accessing characters/substrings, 38-40 
converting from numeric data type, 45^7 
converting to numeric data types, 44-45 
escape sequences with, 36-37 
formatting, 40-42 
length, 35 

string comparisons, 42-43 
string literals, 35 

escape sequences in, 36-37 
raw strings, 37 
Unicode strings, 43 
string module, 133 
atofO function, 139 
atoiO function, 139 
atolO function, 139 
capwordsO function, 139 
character categories, 138-139 


joinO function, 138 
joinfieldsO function, 140 
maketransO function, 139 
splitfieldsO function, 140 
StringlO class, 149-150 
strings (string objects) 

C functions for, 563-564 

characters/substrings in, 38-40 

comparing, 42-43 

concatenating, 37 

converting arrays to, 592-593 

converting to hexadecimal values, 75-76 

formatting, 40-42, 134-135 

handling as files, 149-150 

immutability of, 133 

non-ASCll, 151-152 

optimizing, 511 

pickling, 198 

regular expressions, 140-147 
repeating, 37-38 
searching, 135-136 
Unicode strings, 150 
stripO method, 134 
strptimeO function, 222-223 
struet module 

converting to/from C structures, 204-207 
format characters, 204 
order, alignment and size modifiers, 206 
styles, with Internet text, 305-306 
suh-interpreters, 576 
suhO method/function, 146-147 
_sub_0 method, 112 
suh name statement (Lepto), 435 
suhlists, accessing, 9-10 
suhnQ method, 146 
suhscribeO method, 287 
suhscription operators, 38 

with sequence dta types, 53-54 
suhstituting in expressions, 146 
suhstrings, 9-10 
accessing, 38-40 
managing/editing, 137-138 
searching for, 136 
suhtraction operator (-), 23 
sunau module, 456-458 
sunaudiodev module, 455-456 
SunOS, using sound files in, 455-456 
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swapaxesO function, with arrays, 603 
swapcaseO method, 135 
swap.py (swap module), 200-203 
SWIG (Simplified Wrapper and Interface 
Generator), 546-549 
syralinkO function, 168-169 

syntax. See also specific classes, functions, objects, 
and statements 
case-sensitivity, 5 
class definitions, 15 
creating new objects, 15 
regular expressions, 140-141 
simple assignments, 26-27 
variables and expressions, 4-6 
SyntaxHighlighter.py, 612-613 
sys module 

stderr (Standard error) object, 126 
stdin (Standard input) object, 126 
stdout (Standard output) object, 126 
sys.argv variable, 166 
sys.getrefcountO function, 64-65 
syslog module 

openlogO/closelogO functions, 674-675 
priority values, 673 
systemO function, 179-181 
System Information 

functions for, embedding, 542 
viewing, 187 

System logger (UNIX systems), 673-675 
SystemExit exception, 184 

T 

tags, in markup language, 325-326. See also 
XML format 

HTML methods, 328-329 
rules for, 326-327 
TCPServer class, 104, 261-263 
technical assistance Web sites, 686 
tellO method, 123, 176 
with mmap objects, 176 
Teinet protocol/Telnet class, 296-298 
teinetiib module, 296-298 
tempfile module, 170-171 
template file rules, adding, 648-649 
tempnamO function, 171 
temporary files, creating, 169-170 
TemporaryFile class, 170 


terminal displays 

curses module functions, 415-432 
screen masks for (mask.py), 419-420 
in UNIX Systems, 681-682 
termios/TERMlOS modules, 681-682 
TestCase class, 504-505 
testing code, tools for 

automating, tools for, 502 
doctest modules, 502-503 
rexec security access (bad.py_), 519 
unittest module, 503-505 
testing, remote Importer, 640-641 
TestSuite class, 504 
testzipO method, 215 
text 

displaying on terminals, 415-432 
editing, 426-427 
encoding, 150 

formatting, user input for (Userinput.py), 
357-359 

Internet, formatting, 304-307 
string data type for, 34 
text editors 

Tkinter module example, 362-365 
Windows API example, 663-664 
in wxPython, 401 

text files, accessing lines in, 174-175 
text mode 

data storage, 195 
opening files in, 123-124 
text widgets, 359-360 

TextBox class (curses module), commends for 
(table), 426-427 
TextEditor.py, 362-365 

win32all example, 663-664 
textpad module (curses module), 426-427 
thread module 

creating new threads, 482-483 
locking using, 485-486 
URLGrabber script example, 492-494 
threading module/Thread object, 482-491 
checking thread status, 484 
locating threads, 484-485 
locking using, 486-488 
starting/stopping threats, 483-484 
URLGrabber script example, 489-491 
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threading process, 481-482 
concurrency issues, 484 
interweaving threads, 495 
optimizing, 511 
porting threaded code, 494 
preventing deadiock, 488-489 
threading example, 489-494 
ticks 

in calendar module, 224 
converting from/to, 221 
telling time using, 219 
tilde (~) character (inversion operator), 23 
time 

formats for, 221 

formatting, functions for, 222-223 
handling in wxPython, 413 
localizing, 223-224 
parsing, 222-223 

time module/timeO function, 159, 219 
date/time formatting, 222-224 
handling time zones, 226 
stopwatch functions, 220-221 
Timer.py, 367-369 
timestamps, creating, 222 
TimeTuples, converting, 221, 223 
timeutil module, setting up, 644-647 
titleO method, 135 
titlecase, converting strings to, 135 
Tkdnd module, 382-385 
Tkinter module, 347-348, 354 
adding widgets, 387-388 
breakfast buttons example (FoodChoice.py), 
352-354 

color options, 354, 365 
color scheme customizer, 377-381 
cursor options, 385-387 
dialog/message boxes, 361-365, 381-382 
drag-and-drop support, 382-385 
drawing canvas, 373-374 
font options, 366 
geometry manager, 349-350 
graphics images in, 366-367 
interface-building widgets, 348-349 
in Lepto-based GUI, 445-450 
listbox widget, 375-376 
menu widgets, 360-361 
moving images, 368-369 


with Python Imaging Library, 475-476 
scale widget, 376 
scrollbar widget, 376-377 
size options, 354 
text editor example, 362-365 
text widgets, 359-360 
timers with, 367-369 
User input with, 356-359 
tkMessageBox, 361-362 
TkSimpleDialog module, 381-382 
tmpfileO function, 171 
tmpnamO function, 171 
tofileO method, 70 
tokenize module/function, 611-613 
tolistO method, 69 

viewing contents of range objects, 77 
toolbars, adding to Windows (wxPyton), 395-396 
tostringO method, 70 
traceback module, 603-604 
printing GUI exceptions, 607 
printing stack traces, 606-607 
printing tracebacks, 605-606 
transactions, in databases, 234-235 
translating substrings, 137 
maketransO function, 139 
string module functions, 139-140 
tree Controls (wxPython), 400-401 
treedemo.py (wxPython module), 400-401 
trigonometric functions, 582-583 
truncateO method, 124-125 
ttynameO function, 173 
tuples, 9 
ASTs, 614 

C functions for, 565-566 
creating, 52 

for. . .in-statements with, 55 
passing arguments from, 90 
pickling, 198 

Processing functions, 55-57 
switching to lists from, 10 
TimeTuple, 220 
unpacking, 40 

two-digit years, enabling, 227 
type codes (array objects), 69 
type conversion operators, 115 
types module/typeO function, 67-68 
types, sequence vs. immutable, 9-10 
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U 

UDPServer, 261-263 

ufuncs (universal functions), in NumPy, 593-597 
audio editing program, 594-595 
repeating/iterating, 595-597 
uidO raethod, 287 
uidlQ method, 281 
unary operations, 23 
Unicode strings, 43, 150 
managing, 153 
in Python/C API, 567-569 
Uniform Resource Locators. See URLs 
unittest module, 503-505 
UNIX mailboxes, 320-321 
UNIX Systems 

accessing Python, 3-4 
accessing system logger, 673-675 
CGI Scripts for, 299 
child processes, 183 
controlling resource use, 679-680 
epochs, 219 

exiting from processes, 184-185 
file descriptors, 680-681 
inodes, 158 

passwords, groups, 671-672 
proxy servers in, 276 
PythonPath variable, 94 
reading individual characters, 121 
running Python programs, 6 
signal handlers, 192 

System Information, viewing, 187, 677-679 
temporary files, managing, 171 
terminals, pseudoterminais in, 173, 681-682 
wildcards in, 167 
UnixDatagramServer, 261 
UnixStreamServer, 261 
unpacking data types, 54, 210 
unquoteO function, 276 
unsubscriheO method, 287 
uploading files (FTP), 290-291 
upperO method, 134 
urlcleanupO function, 276 
URLGrahher scripts (threading), 489-494 
urljoinQ function, 303-304 
urllib lihrary, 276 
urllib2 library, 278 
URLopener, 277-278 


urlparse module/function, 277, 303-304 
urlretrieveO function, 276 
URLs (Uniform Resource Locators) 
handling as files, 277 
managing, 276-277, 303-304 
opening/accessing, 277-278, 308 
retrieving, 276 
User input 

in curse terminal displays, 421-425 
in GUIs, 356-359 
reading, 120-121 
User interfaces. See also GUIs 

Internet formatter interface, 304-305 
Lepto-based, 433-450 
UserDict module/UserDict class, 106 
Userinput.py, 357-359 
UserList module/class, 104-105 
UserString module 

MutableString class, 105 
UserString class, 105 
UsingNew.py, 615 
utimeO function, 159 

uu module/uuencoding algorithm, 317-318 

V 

valid identifiers, examples of, 19 
values, 10-11, 14 
built-in, 11 

hash values, retrieving, 62 
referencing in variables, 5 
in Windows registry, 664 
variables, 88 

assignment statements, 26-28 
class variables, 100 
creating, 26 

defining, scope rules, 95-96 
environmental, 94 
instance variables, 101 
naming, 19-20 
value references in, 5 
vectors, adding, 108 
verifyO method, 284 

version numhers, of Software, checking, 610-611 
vertical slash [ I ], 141 

w 

wait status interpretation functions, 184-185 
WaitCursor.py (Tkinter module), 386-387 
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walkO function, 164 

warnings, in Python/C API, 578-579 

WAV files 

reading/writing, 458 
reversing (ReverseSound.py), 459-460 
weak references, handiing, 65, 115-117 
weakref module 

creating weak references, 116 
getweakrefcountO function, 116 
getweakrefsO function, 116 
mappingO function, 117 
proxyO function, 117 
Web browsers 

creating/managing, 264-269 
viewing files in, 308 

Web requests, sending/receiving, 279-280 
Web Robot, 331-334 
Web servers 

cookies, 321-323 
creating/managing, 264-269 
documentation Web server, 501 
Web sites 

extension classes, 550 
Python MegaWidgets, 389 
Python downloads, 685-686 
Python Enhancement Proposals (PEPs), 687 
Python Imaging Library, 472, 480 
tutoriais, 17 
wxPython for, 391 
WebSearch.py, 280 
weekdayO function, 225 
where command (pdb module), 498 
whereO function, with arrays, 602 
while-statements, 79 

in looping statements, 8 
widgets (Tkinter module), 349 
appearance options, 355 
behavior options, 355 
building GUI with, 348-349 
color options, 354, 365 
color scheme customizer , 377-381 
creating drawing canvas, 373-374 
cursor options, 385-387 
designing/customizing, 387-388 
dialog/message boxes, 361-365 
event handiers/objects, 371-373 
font options, 366 


geometry managers, 349-350 
grid method options, 351-352 
graphics handiing, 366-367 
layout constraints, 406-407 
listbox widget, 375-376 
MegaWidgets Web site, 389 
menu widgets, 360-361 
packer methods, 350-351 
scale widgets, 376 
scrollbar widgets, 376-377 
size options, 355 
text widgets, 359-360 
timers with, 367-369 
User input, incorporating, 356-359 
width fields, 42 
_winreg functions, 668-669 
win32all (Python Extensions for Windows) 
accessing Windows registry, 664-669 
data type dictionaries, 661-662 
error messages (exceptions), 662 
setting Internet Explorer horne page, 666 
text editor, 663-664 
win32api functions, 668-669 
WindowObject class, 415 
Windows API wrappers, 661-664 
Windows Internet Information Server (IIS), CGI 
Scripts for, 298-299 
Windows (curses module) 
managing, 425-426 
refreshing, 418 
Windows registry, 661 

access constants (table), 665 
accessing, 664-669 
killing keys in (KillKey.py), 667-668 
Windows systems 

accessing Python from, 3-4 
epochs, 219 

opening text mode files, 123-124 
playing sound files, 454-455 
proxy servers in, 276 
PythonPath variable, 94 
reading individual characters, 121 
running Python programs, 6, 180 
WingIDE, 690 

winsound module, 454-455 
WordCount.py, 12-13 
working directory, viewing, 165 
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wrapperQ function (curses module), 416 
writeO method/function, 124 
with ConfigParser object, 190 
with file descriptors, 173 
with mmap objects, 176 
with sunaudiodev module, 455 
Zipinfo object, 215 
write_byteO method, 176 
writeframesO method, with audio files, 457 
writelinesO method, 124 
writepyO method, 216 
writing to files, methods for, 124 
wxAcceleratorTalbe class 
wxApp object, 393 
wxBitmap class, 413 
wxBoxSizer classes, 403 
wxButton classes, 399 
wxcanvas.py, 409-411 
wxCalendar class, 413 
wxChoice class, 400 
wxClipboard class, 413 
wxDataFormat/DataObject classes, 413 
wxDate/wxDateTime classes, 413 
wxDialog class, 394, 399 
wxDraglmage class, 413 
wxDropSource/DropTarget classes, 413 
wxEditor class, 401 
wxEvent class, 394 
wxFloatBar clas, 398 
wxFont/wxFontData classes, 413-414 
wxFrame class, 394 
wxGrid class, 395 
wxGridSizer class, 405-406 
wxHTMLWindow class, 395 
wxicon class, 413 

wxImage/wxlmageHandler classes, 413 
wxMask class, 413 

wxMDlChildFrame/ParentFrame classes, 396 

wxMDlClientWindow class, 396 

wxMenu class, 411 

wxMVCTree class, 401 

wxNewIdO function, 394 

wxNotebook clas, 398 

wxPalette class, 413 

wxPanel class, 399 

wxPrintDialog/wxPageSetUpDialog classes, 414 


wxPrintPreview class, 414 
wxPyEditor class, 401 
wxPython module, 391-392 
built-in dialogs, 407-408 
common Controls, 399-400 
cursors, drag-and-drop, 413 
device context classes, 408-411 
drawing in, 409-411 
editor Controls, 401 
example program, 392-394 
formatting options, 413-414 
HTML handling options, 414 
keyboard input options, 412 
layout options, 401-407 
menus, keyboard features, 411-412 
mouse options, 412 
printing options, 414 
tree Controls, 400-401 
window options, 394-398 
wxResourceParseFile object, 403 
wxScrolledWindow object, 395 
wxSplitterWindow class, 395 
wxStatusBar class, 395 
wxTimeSpan class, 413 
wxToolbar class, 395 
wxWindow class options, 394-398 

X 

XDR (eXternal Data Representation) format, 
converting to/from, 208 
xdrlib module, 208 
xgtitleO method, 294 
xhdrO method, , 294 
XML format, 325 
DTDs, 326 
namespaces, 327 
parsing, 334-343 
Processing functions, 327 
saving XML files, 210 
XML handlers, 343 
xmllib module, 327 
features, 341-342 

parsing XML, example (BloodType.py), 342 
XMLParser class, 341-343 
XMLReader class, 336-337 
xml.sax module, 334 
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XOR (exclusive-or) operator 23 
_xor_0 method, 114 
xoverQ method, 294 
xrangeO function, 51 
xrange objects, 77 
xreadlinesO function, 126 

Y 

yanking, 677 

years, two-digit, enabling, 227 
YIQ color System, 470 


z 

zero arguments, unpacking in C/C++ 
conversions, 538 
zeros, leading, in strings, 41 
zfillO function, 140 
zipfile module, 214-215 
Zipinfo class, 215-216 
zlip module 

zones, time, handling, 226 



